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Preface 


It is a pleasure to accept the invitation of Harcourt/Academic Press to publish 
a second edition. The first edition has been used mainly in graduate courses in 
measure and probability, offered by departments of mathematics and statistics 
and frequently taken by engineers. We have prepared the present text with 
this audience in mind, and the title has been changed from Real Analysis and 
Probability to Probability and Measure Theory to reflect the revisions we have 
made. 

Chapters 1 and 2 develop the fundamentals of measure and integration the- 
ory. Included are several results that are crucial in constructing the foundations 
of probability: the Radon—Nikodym theorem, the product measure theorem, 
the Kolmogorov extension theorem and the theory of weak convergence of 
measures. We remain convinced that it is best to assemble a complete set of 
measure-theoretic tools before going into probability, rather than try to de- 
velop both areas simultaneously. The gain in efficiency far outweighs any 
temporary loss in motivation. Those who wish to reach probability as quickly 
as possible may omit Chapter 3, which gives a brief introduction to functional 
analysis, and Section 2.3, which gives some applications to real analysis. In 
addition, instructors may wish to summarize or sketch some of the intricate 
constructions in Sections 1.3, 1.4, and 2.7. 

The study of probability begins with Chapter 4, which offers a summary of 
an undergraduate probability course from a measure—theoretic point of view. 
Chapter 5 is concerned with the general concept of conditional probability 
and expectation. The approach to problems that involve conditioning, given 
events of probability zero, is the gateway to many areas of probability theory. 
Chapter 6 deals with strong laws of large numbers, first from the classical 
viewpoint, and then via martingale theory. Basic properties and applications 
of martingale sequences are developed systematically. Chapter 7 considers 
the central limit problem, emphasizing the fundamental role of Prokhorov’s 
weak compactness theorem. The last two sections of this chapter cover some 
material (not in the first edition) of special interest to statisticians: Slutsky’s 
theorem, the Skorokhod construction, convergence of transformed sequences 
and a k-dimensional central limit theorem. 


Vil 


viii PREFACE 


Chapters 8 and 9 have been added in the second edition, and should be of 
interest to the entire prospective audience: mathematicians, statisticians, and 
engineers. Chapter 8 covers ergodic theory, which is developed far enough 
so that connections with information theory are clearly visible. The Shan- 
non—McMillan theorem is proved and the isomorphism problem for Bernoulli 
shifts is discussed. Chapter 9 treats the one-dimensional Brownian motion 
in detail, and then introduces stochastic integrals and the Itô differentiation 
formula. 

To make room for the new material, the appendix on general topology 
and the old Chapter 4 on the interplay between measure theory and topology 
have been removed, along with the section on topological vector spaces in 
Chapter 3. We assume that the reader has had a course in basic analysis and is 
familiar with metric spaces, but not with general topology. All the necessary 
background appears in Real Variables With Basic Metric Space Topology by 
Robert B. Ash, IEEE Press, 1993. (The few exercises that require additional 
background are marked with an asterisk.) 

It is theoretically possible to read the text without any prior exposure to 
probability, picking up the necessary equipment in Chapter 4. But we expect 
that in practice, almost all readers will have taken a standard undergradu- 
ate probability course. We believe that discrete time, discrete state Markov 
chains, and random walks are best covered in a second undergraduate prob- 
ability course, without measure theory. But instructors and students usually 
find this area appealing, and we discuss the symmetric random walk on R* in 
Appendix 1. 

Problems are given at the end of each section. Fairly detailed solutions are 
given to many problems, and instructors may obtain solutions to those prob- 
lems in Chapters 1—8 not worked out in the text by writing to the publisher. 

Catherine Doleans—Dade wrote Chapter 9, and offered valuable advice and 
criticism for the other chapters. Mel Gardner kindly allowed some material 
from Topics in Stochastic Processes by Ash and Gardner to be used in Chap- 
ter 8. We appreciate the encouragement and support provided by the staff at 
Harcourt/Academic Press. 


Robert B. Ash 
Catherine Doleans—Dade 
Urbana, Illinois, 1999 


Summary of Notation 


We indicate here the notational conventions to be used throughout the book. 
The numbering system is standard; for example, 2.7.4 means Chapter 2, 
Section 7, Part 4. In the appendices, the letter A is used; thus A2.3 means 
Part 3 of Appendix 2. 

The symbol O is used to mark the end of a proof. 


1 Sets 

If A and B are subsets of a set Q, A UB will denote the union of A and B, 
and A N B the intersection of A and B. The union and intersection of a family 
of sets A; are denoted by |), A; and f); A;. The complement of A (relative to 
Q) is denoted by A“. 

The statement “B is a subset of A” is denoted by B C A; the inclusion need 
not be proper, that is, we have A C A for any set A. We also write B CA as 
A > B, to be read “A is an overset (or superset) of B.” 

The notation A — B will always mean, unless otherwise specified, the set of 
points that belong to A but not to B. It is referred to as the difference between 
A and B; a proper difference is a set A — B, where B C A. 

The symmetric difference between A and B is by definition the union of 
A — B and B — A; it is denoted by AAB. 

If A; C Az C--- and LJ™, An =A, we say that the A, form an increasing 
sequence of sets (increasing to A) and write A, + A. Similarly, if A, D A> 
>D- and Nr, An = A, we say that the A, form a decreasing sequence of 
sets (decreasing to A) and write A, | A. 

The word “includes” will always imply a subset relation, and the word 
“contains” a membership relation. Thus if @ and & are collections of sets, 
“g includes Y” means that Z C Z. Equivalently, we may say that 4 contains 
all sets in & , in other words, each A € & is also a member of @. 

A countable set is one that is either finite or countably infinite. 

The empty set Ø is the set with no members. The sets A;, i € J, are disjoint 
if A; A; = Ø for alli Æ j. 
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2 REAL NUMBERS 

The set of real numbers will be denoted by R, and R” will denote n- 
dimensional Euclidean space. In R, the interval (a, b] is defined as {x € R: 
a< x < b}, and (a, co) as {x € R: x > a}; other types of intervals are defined 


similarly. If a = (a1, ..., an) and b = (bi, ..., ba) are points in R”, a < b 
will mean a; < b; for all i. The interval (a, b] is defined as {x € R”: a; < x; 
< b; i= 1,..., n}, and other types of intervals are defined similarly. 


The set of extended real numbers is the two-point compactification 
R U {co} U {—co}, denoted by R; the set of n-tuples (x1, ...,x,), with each 
x; € R, is denoted by R”. We adopt the following rules of arithmetic in R: 


a+ = o +a = œ, a— œ = -0 +a = —O, aeR, 


œo + CO = OO, —00 — 0 = — CO (co — œ is not defined), 


o- pul © if beR b>0, 
700 = 00 + if beR, b<0, 


oe =0, aeR (= is not defined), 
CO -00 OO 
0.œ =% 0=0. 


The rules are convenient when developing the properties of the abstract 
Lebesgue integral, but it should be emphasized that R is not a field under 
these operations. 

Unless otherwise specified, positive means (strictly) greater than zero, and 
nonnegative means greater than or equal to zero. 

The set of complex numbers is denoted by C, and the set of n-tuples of 
complex numbers by C”. 


3  FUuNcTIONS 

If f is a function from Q to Q’ (written as f: Q — Q) and B c Y, 
the preimage of B under f is given by f~'(B)={weEQ: fw) eB} 
It follows from the definition that fU; Bi = U; fT! (Bi), fN; Bd 
= M; f7' (Bi), FTA — B) = f1 (A) — f7 (B); hence f71(A°) = [fT (ADI. 
If 7 is a class of sets, f~'(@) means the collection of sets f7! (B), Be Z. 

If f: R — R, f is increasing iff x < y implies f(x) < f(y); decreasing iff 
x< y implies f(x) > f(y). Thus, “increasing” and “decreasing” do not have 
the strict connotation. If fa: Q —> R, n =1,2,..., the f, are said to form 
an increasing sequence iff fa (œw) < fn+i(@) for all n and w; a decreasing 
sequence is defined similarly. 
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If f and g are functions from Q to R, statements such as f < g are always 
interpreted as holding pointwise, that is, f(@) < g(w) for all œ € Q. Similarly, 
if f;; Q— R for each i€ J, sup, f; is the function whose value at œ is 
sup{ f;(w): i €I}. 

If fi, f2,... form an increasing sequence of functions with limit f [that 
is, limps fn(@) = f(w) for all w], we write fa t f. Similarly, fa 4 f is 
used for a decreasing sequence.) 

Sometimes, a set such as {w € Q: f(w) < g(qw)} is abbreviated as {f < g}; 
similarly, the preimage {w € Q: f(w) € B} is written as {f € B}. 

If A C Q, the indicator of A is the function defined by 74(@) = 1 if mE A 
and by [4(w) = 0 if w ¢ A. The phrase “characteristic function” is often used 
in the literature, but we shall not adopt this term here. 

If f is a function of two variables x and y, the symbol f (x, -) is used for 
the mapping y —> f(x, y) with x fixed. 

The composition of two functions X: Q > Q and f: QR > Q” is denoted 
by foX or f(X). 

If f: Q — R, the positive and negative parts of f are defined by ft 
= max(f,0) and f~ = max(— f, 0), that is, 


ft(o) = f fw) if f(w)>0, 


if  f(w) <0, 
_. _ f-flo) if f@) <0, 
J (œ) = ‘0 if f(w)>0. 


4 ToroLoGY 

A metric space is a set Q with a function d (called a metric) from Q x Q 
to the nonnegative reals, satisfying d(x, y) > 0, d(x, y) = 0 iff x = y, d(x, y) 
= d(y,x), and d(x, z) < d(x, y) + d(y, z). If d(x, y) can be O for x Æ y, but 
d satisfies the remaining properties, d is called a pseudometric (the term 
semimetric is also used in the literature). 

A ball (or open ball) in a metric or pseudometric space is a set of the form 
B(x, r) = {y € Q: d(x, y) < r} where x, the center of the ball, is a point of 
©, and r, the radius, is a positive real number. A closed ball is a set of the 
form B(x, r) = {y E€ Q: d(x, y) < r}. 

Sequences in Q are denoted by {x,,n = 1, 2,...}. The term “lower semi- 
continuous” is abbreviated LSC, and “upper semicontinuous” is abbreviated 
USC. 

No knowledge of general topology (beyond metric spaces) is assumed, 
and the few comments that refer to general topological spaces can safely 
be ignored. 
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5 Vector SPACES 

The terms “vector space” and “linear space” are synonymous. All vector 
spaces are over the real or complex field, and the complex field is assumed 
unless the term “real vector space” is used. 

A Hamel basis for a vector space L is a maximal linearly independent subset 
B of L. (Linear independence means that if x1, ..., Xn € B, n =1,2,..., and 
C1,..., Cn are scalars, then 5>;_, cix; = 0 iff all c; = 0.) Alternatively, a Hamel 
basis is a linearly independent subset B with the property that each x € Lisa 
finite linear combination of elements in B. [An orthonormal basis for a Hilbert 
space (Chapter 3) is a different concept.| 

The terms “subspace” and “linear manifold” are synonymous, each referring 
to a subset M of a vector space L that is itself a vector space under the 
operations of addition and scalar multiplication in L. If there is a metric on L 
and M is a closed subset of L, then M is called a closed subspace. 

If B is an arbitrary subset of L, the linear manifold generated by B, denoted 
by L(B), is the smallest linear manifold containing all elements of B, that 
is, the collection of finite linear combinations of elements of B. Assuming a 
metric on L, the space spanned by B, denoted by S(B), is the smallest closed 
subspace containing all elements of B. Explicitly, S(B) is the closure of L(B). 


6 Zorn’s LEMMA 

A partial ordering on a set S is a relation “<” that is 
(1) reflexive: a <a; 
(2) antisymmetric: if a < b and b < a, then a = b; and 
(3) transitive: if a < b and b < c, then a < c. 

(All elements a, b, c belong to S.) 


If C CS, C is said to be totally ordered iff for all a, b € C, either a < b or 
b < a. A totally ordered subset of $ is also called a chain in S. 

The form of Zorn’s lemma that will be used in the text is as follows. 

Let $ be a set with a partial ordering “<.” Assume that every chain C in $ 
has an upper bound; in other words, there is an element x € $ such that x > a 
for all a € C. Then S has a maximal element, that is, an element m such that 
for each a € S it is not possible to have m < a and m F a. 


Zorn’s lemma is actually an axiom of set theory, equivalent to the axiom 
of choice. 
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FUNDAMENTALS OF MEASURE AND 
INTEGRATION THEORY 


In this chapter we give a self-contained presentation of the basic concepts of 
the theory of measure and integration. The principles discussed here and in 
Chapter 2 will serve as background for the study of probability as well as 
harmonic analysis, linear space theory, and other areas of mathematics. 


1.1 INTRODUCTION 

It will be convenient to start with a little practice in the algebra of sets. 
This will serve as a refresher and also as a way of collecting a few results 
that will often be useful. 

Let A1, A2, ... be subsets of a set Q. If A} C A C--- and UJ, A, =A, 
we say that the A, form an increasing sequence of sets with limit A, or that 
the A, increase to A; we write A, t A. If A) DA2 D--- and (|, A, =A, 
we say that the A, form a decreasing sequence of sets with limit A, or that 
the A, decrease to A; we write A, | A. 

The De Morgan laws, namely, (U, An) = Nn AE, (n An) =U, AS, im- 
ply that 

(1) if A, t A, then AF | A‘; if A, | A, then AF + A°. 


It is sometimes useful to write a union of sets as a disjoint union. This may 
be done as follows: 
Let Aj, Az, ... be subsets of . For each n we have 


(2) Ut, Ai = Ai U (Af N A2) U (AS N A5 N A3) 
U---U (AGA e Af MN Ap). 
Furthermore, 
(3) Ur An = UZ ASNA. AAE NA). 


In (2) and (3), the sets on the right are disjoint. If the A, form an increasing 
sequence, the formulas become 


2 | FUNDAMENTALS OF MEASURE AND INTEGRATION THEORY 


(4) UiA = 41 U (A2 — A1) U -++ U (An — Ant) 
and 

(5) Ona Ay —= Una An — An-1) 
(take Ao as the empty set). 

The results (1)—(5) are proved using only the definitions of union, intersec- 
tion, and complementation; see Problem 1. 

The following set operation will be of particular interest. If A), A>,... are 
subsets of Q, we define 

(6) limsup, An = Mra Use, 4r- 

Thus w € limsup, A, iff for every n, œ € Ay for some k > n, in other 
words, 

(7) œ €limsup, A, iff œ € A, for infinitely many n. 
Also define 

(8) liminf, A, = UF; Nz, Ar. 
Thus œ € liminf, A, iff for some n, œ € A; for all k > n, in other words, 

(9) œ € liminf, A, iff @ € A, eventually, that is, for all but finitely 
many 7. 

We shall call lim sup, A, the upper limit of the sequence of sets A,, and 
liminf, A, the lower limit. The terminology is, of course, suggested by the 
analogous concepts for sequences of real numbers 


lim sup x, = inf sup xg, 
n n k>n 


lim inf x, = sup int Xk. 

See Problem 4 for a further development of the analogy. 

The following facts may be verified (Problem 5): 

(10) (limsup, A, )° = liminf, Af 

(11) (liminf, An) = lim sup, Af 

(12) liminf, A, C lim sup, A, 

(13) IfA, tA or A, | A, then lim inf, A, = lim sup, A, = A. 

In general, if lim inf, A, = limsup, A, = A, then A is said to be the limit 
of the sequence Aj, A2,...; we write A = lim, An. 


Problems 


1. Establish formulas (1)-(5). 
2. Define sets of real numbers as follows. Let A, = (—1/n, 1] if n is odd, 
and A, = (—1, 1/n] if n is even. Find lim sup, A, and liminf, An. 


3. Let Q = R’,A, the interior of the circle with center at ((—1)"/n, 0) and 
radius 1. Find lim sup, A, and lim inf, A,. 


1.2 FIELDS, o-FIELDS, AND MEASURES 3 


4. Let {x,} be a sequence of real numbers, and let A, = (—©%, x,). What 
is the connection between limsup,_,.,x, and lim sup, A, (similarly for 
lim inf )? 

5. Establish formulas (10)—(13). 

6. Let A = (a,b) and B = (c,d) be disjoint open intervals of R, and let 
C,, =A ifn is odd, C, = B if n is even. Find limsup, Cùn and lim inf, C,. 


1.2 Firps, o-FIELDS, AND MEASURES 

Length, area, and volume, as well as probability, are instances of the mea- 
sure concept that we are going to discuss. A measure is a set function, that 
is, an assignment of a number u(A) to each set A in a certain class. Some 
structure must be imposed on the class of sets on which u is defined, and 
probability considerations provide a good motivation for the type of structure 
required. If Q2 is a set whose points correspond to the possible outcomes of a 
random experiment, certain subsets of Q will be called “events” and assigned 
a probability. Intuitively, A is an event if the question “Does w belong to A?” 
has a definite yes or no answer after the experiment is performed (and the 
outcome corresponds to the point w € 2). Now if we can answer the question 
“Is w € A?” we can certainly answer the question “Is œ € A°?,” and if, for 
each i = 1,...,m, we can decide whether or not œw belongs to A;, then we can 
determine whether or not œw belongs to |J;_, A; (and similarly for (];_, Aj). 
Thus it is natural to require that the class of events be closed under comple- 
mentation, finite union, and finite intersection; furthermore, as the answer to 
the question “Is œw € Q?” is always “yes,” the entire space Q should be an 
event, Closure under countable union and intersection is difficult to justify 
physically, and perhaps the most convincing reason for requiring it is that a 
richer mathematical theory is obtained. Specifically, we are able to assert that 
the limit of a sequence of events is an event; see 1.2.1. 


1.2.1 Definitions. Let ¥ be a collection of subsets of a set Q. Then ¥ is 
called a field (the term algebra is also used) iff Q € ¥ and ¥ is closed under 
complementation and finite union, that is, 


(a) QEF. 
b IAEF, then AS EF. 
(c) IfA, A2,..., An E F, then U A; EF. 


It follows that ¥ is closed under finite intersection. For if A,,...,A, E F, 
then 


C 


= (U4 EF. 


Ww ~~” 


? (c) is replaced by closure under countable union, that is, 
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(d) If Aj, A2,... E€ F, then JZ Ai EF, 


F is called a o-field (the term o-algebra is also used). Just as above, .¥ is 
also closed under countable intersection, 

If ¥ is a field, a countable union of sets in .¥ can be expressed as the limit 
of an increasing sequence of sets in .¥, and conversely. To see this, note that 
if A = U; An, then LJ, Ai t A; conversely, if A, t A, then A =|), An. 
This shows that a o-field is a field that is closed under limits of increasing 
sequences. 


1.2.2 Examples. The largest o-field of subsets of a fixed set Q is the col- 
lection of all subsets of 02. The smallest o-field consists of the two sets Ø 
and £2. 

Let A be a nonempty proper subset of Q, and let 7 = {@, Q, A, A°}. Then 
F is the smallest o-field containing A. For if ¥ is a o-field and A € &, then 
by definition of a o-field, Q, Ø, and A“ belong to ¥, hence Z C £. But F is 
a o-field, for if we form complements or unions of sets in .¥, we invariably 
obtain sets in.¥. Thus .¥ is a o-field that is included in any o-field containing 
A, and the result follows. 

If A;,...,A, are arbitrary subsets of Q, the smallest o-field containing 
A,,...,A, may be described explicitly; see Problem 8. 

If .~ is a class of sets, the smallest o-field containing the sets of .~ will be 
written as o(”), and sometimes called the minimal o-field over Z. We also 
call o(”) the o-field generated by *, and currently this is probably the most 
common terminology. 

Let Q be the set R of real numbers. Let .~ consist of all finite disjoint 
unions of right-semiclosed intervals. (A right-semiclosed interval is a set 
of the form (a, b] = {x: a < x < b}, -c <a < b < œ; by convention we 
also count (a, oo) as right-semiclosed for —co < a < oo. The convention is 
necessary because (—co, a] belongs to %7, and if .¥ is to be a field, the com- 
plement (a, co) must also belong to .¥%.) It may be verified that conditions 
(a)—(c) of 1.2.1 hold; and thus .¥ is a field. But .¥ is not a o-field; for 
example, A, = (0,1 — (1/n)]€ ¥,n =1,2,..., and J, A, = (0, 1) ¢.F¥. 

If Q is the set R = [—co, co] of extended real numbers, then just as above, 
the collection of finite disjoint unions of right-semiclosed intervals forms a 
field but not a o-field. Here, the right-semiclosed intervals are sets of the 
form (a, b] = {x: a <x < b}, -—w <a < b < œ, and, by convention, the sets 
[—co, b] = {x: —co < x < b}, —cO < b < ow. (In this case the convention is 
necessary because (b, co] must belong to 7, and therefore the complement 
[—co, b] also belongs to .¥.) 

There is a type of reasoning that occurs so often in problems involving 
o-fields that it deserves to be displayed explicitly, as in the following typical 
illustration. 
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If & is aclass of subsets of Q and A C Q, we denote by @ N A the class 
{BOA: Be Z}. If the minimal o-field over “ is o0(4) = Z, let us show 
that 


where 04,(“% MA) is the minimal o-field of subsets of A over 6 N A. (In other 
words, A rather than Q is regarded as the entire space.) 

Now Z c.¥, hence Z NA C.¥ NA, and it is not hard to verify that. NA 
is a o-field of subsets of A. Therefore 04(% NA) C.#¥ NA. 

To establish the reverse inclusion we must show that BN A € o4(@ MA) for 
all B e ¥. This is not obvious, so we resort to the following basic reasoning 
process, which might be called the good sets principle. Let .~ be the class of 
good sets, that is, let .“ consist of those sets B € .¥ such that 


BNAE oala NA). 


Since 7 and oa(Z MA) are o-fields, it follows quickly that .~ is a o-field. 
But 2 C.~%, so that o(% ) C.Y, hence .¥ = .~ and the result follows. Briefly, 
every set in % is good and the class of good sets forms a o-field; consequently, 
every set in o(%’) is good. 

One other comment: If % is closed under finite intersection and A € 7%, 
then Z NA ={Ce #4: C CA}. (Observe that if C C A, then C = CNA.) 


1.2.3 Definitions and Comments. A measure on a o-field ¥ is a nonneg- 
ative, extended real-valued function u on ¥ such that whenever Aj, A2,... 
form a finite or countably infinite collection of disjoint sets in., we have 


7 (Ua) = $ MAn). 


If u(Q) = 1, u is called a probability measure. 

A measure space is a triple (Q, Z, u) where Q is a set, ¥ is a o-field 
of subsets of Q, and u is a measure on .¥. If u is a probability measure, 
(Q2,.%, u) is called a probability space. 

It will be convenient to have a slight generalization of the notion of a 
measure on a o-field. Let ¥ be a field, u a set function on .¥ (a map from 
F to R ). We say that u is countably additive on ¥ iff whenever A), Az, ... 
form a finite or countably infinite collection of disjoint sets in.“ whose union 
also belongs to .¥ (this will always be the case if ¥Y is a o-field) we have 


If this requirement holds only for finite collections of disjoint sets in Z, p is 
said to be finitely additive on 7. To avoid the appearance of terms of the form 
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+co —oo in the summation, we always assume that +00 and —oo cannot both 
belong to the range of u. 

If u is countably additive and (A) > 0 for all A € Z, u is called a measure 
on .¥, a probability measure if (Q) = 1. 

Note that countable additivity actually implies finite additivity. For if (A) 
= +o for all A € Z, or if u(A) = —o for all A € .F, the result is immediate; 
therefore assume u(A) finite for some A € .¥. By considering the sequence 
A, Ø, Ø, ..., we find that (4) = 0, and finite additivity is now established by 


considering the sequence A;,...,A,,@,@,..., where A;,...,A, are disjoint 
sets in 7. 

Although the set function given by (A) = +œ for all A €.¥ satisfies the 
definition of a measure, and similarly ~(A) = —œ for all A € ¥ defines a 


countably additive set function, we shall from now on exclude these cases. 
Thus by the above discussion, we always have u (ø) = 0. 

If Aec. and (AS) = 0, we can frequently ignore A; we say that u is 
concentrated on A. 


1.2.4 Examples. Let Q be any set, and let ¥ consist of all subsets of 
©. Define (A) as the number of points of A. Thus if A has n members, 
n=0,1,2,..., then w(A) = n; if A is an infinite set, ~(A) = œœ. The set 
function u is a measure on .¥, called counting measure on Q. 

A closely related measure is defined as follows. Let Q = {x1, x2,...} be 
a finite or countably infinite set, and let pı, p2, ... be nonnegative numbers. 
Take ¥ as all subsets of Q, and define 


(A) = ` Pi. 
X EA 


Thus if A = {x;,, x;,..-}, then u(A) = pi, + pi, +:--. The set function u is 
a measure on .¥ and u{x;} = p;,i = 1,2,.... A probability measure will be 
obtained iff X`, p; = 1; if all p; = 1, then yw is counting measure. 

Now if A is a subset of R, we try to arrive at a definition of the length of A. 
If A is an interval (open, closed, or semiclosed) with endpoints a and b, it is 
reasonable to take the length of A to be u(A) = b — a. If A is a complicated set, 
we may not have any intuition about its length, but we shall see in Section 1.4 
that the requirements that (a, b] = b — a for all a, b € R, a < b, and that u 
be a measure, determine u on a large class of sets. 

Specifically, u is determined on the collection of Borel sets of R, denoted 
by (R) and defined as the smallest o-field of subsets of R containing all 
intervals (a, b],a,b €e R. 

Note that .4(R) is guaranteed to exist; it may be described (admittedly in a 
rather ethereal way) as the intersection of all o-fields containing the intervals 


1.2 FIELDS, o-FIELDS, AND MEASURES 7 


(a, b]. Also, if a o-field contains, say, all open intervals, it must contain all 
intervals (a, b], and conversely. For 


OS 


ja l l 
(a, b] = ( } (a,b+—] and = (a,b) = [J (a,b- =), 


n=l n=] 


Thus .# (R) is the smallest o-field containing all open intervals. Similarly we 
may replace the intervals (a, b] by other classes of intervals, for instance, 


all closed intervals, 

all intervals [a, b), a, b € R, 
all intervals (a, co), a € R, 
all intervals [a, co),a € R, 
all intervals (—co, b),b e R, 
all intervals (—co, b], be R. 


Since a o-field that contains all intervals of a given type contains all inter- 
vals of any other type, # (IR) may be described as the smallest o-field that 
contains the class of all intervals of R. Similarly, .# (R) is the smallest o-field 
containing all open sets of R. (To see this, recall that an open set is a count- 
able union of open intervals.) Since a set is open iff its complement is closed, 
2 (IR) is the smallest o-field containing all closed sets of R. Finally, if .Ao is 
the field of finite disjoint unions of right-semiclosed intervals (see 1.2.2), then 
2 (IR) is the smallest o-field containing the sets of .Fpo. 

Intuitively, we may think of generating the Borel sets by starting with the 
intervals and forming complements and countable unions and intersections in 
all possible ways. This idea is made precise in Problem 11. 

The class of Borel sets of R, denoted by .#(R), is defined as the smallest 
o-field of subsets of R containing all intervals (a, b], a, b € R. The above 
discussion concerning the replacement of the right-semiclosed intervals by 
other classes of sets applies equally well to R. 

If Ee #@(R), P(E) will denote {B e .4(R): BC Ey, this coincides with 
{AN E: A e A(R), (see 1.2.2). 

We now begin to develop some properties of set functions. 


1.2.5 Theorem. Let u be a finitely additive set function on the field 7. 


(a) “()=0. 
(b) “(AU B)4+ w(ANB) = u(A)+ u(B) for all A, B E€ F. 
(c) IfA, Be ¥ and BCA, then u(A) = u(B) + u(A — B) 


(hence (A — B) = u(A)— u(B) if u(B) is finite, and u(B)< u(A) if 
u(A — B) > 0). 
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(d) If u is nonnegative, 
u (Já; <)> uA) for all Åj,- Án E F. 
i=] i=] 


If u is a measure, 


u| An | <$ un) 
n=l 


n=l 


for all Aj, A2,... € Z such that J", An E.F. 
Proof. (a) Pick A € ¥ such that u(A) is finite; then 
L(A) = uA U Ø) = uA) + uØ). 
(b) By finite additivity, 
(A) = u(A N B)+ u(A — B), 
u(B) = uA N B) + u(B — A). 
Add the above equations to obtain 


u(A) + u (B) = WAN B) + [u (A — B) + u(B — A) + u(A N B)] 
= u(A N B) + u(A U B). 


(c) We may write A = BU (A — B), hence u(A) = u(B) + u(A — B). 
(d) We have 
(JA; = Ai U (Af N A2) U (AG NAS NA3)U U (AFN NAS i N An) 
{=| 

[see Section 1.1, formula (2)]. The sets on the right are disjoint and 


WAG N=» MAS i An) < An) by (0). 


The case in which u is a measure is handled using identity (3) of Sec- 
tion 1.1. O 


1.2.6 Definitions. A set function u defined on ¥ is said to be finite iff 
H(A) is finite, that is, not too, for each A €e .¥. If u is finitely additive, it is 
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sufficient to require that 4(Q) be finite; for Q = A U A®, and if (A) is, say, 
+00, so is u(2). 

A nonnegative, finitely additive set function u on the field ¥ is said to be 
o-finite on F iff Q can be written as JP, A, where the A, belong to ¥ and 
u(A,) < œ for all n. [By formula (3) of Section 1.1, the A, may be assumed 
disjoint.] We shall see that many properties of finite measures can be extended 
quickly to o-finite measures. 

It follows from 1.2.5(c) that a nonnegative, finitely additive set function y 
on a field ¥ is finite iff it is bounded; that is, sup{|4(A)|: A € Z} < oo. This 
no longer holds if the nonnegativity assumption is dropped (see Problem 4). 
It is true, however, that a countably additive set function on a o-field is finite 
iff it is bounded; this will be proved in 2.1.3. 

Countably additive set functions have a basic continuity property, which we 
now describe. 


1.2.7 Theorem. Let u be a countably additive set function on the o-field .¥. 


(a) IfA,,A2,...€ Fand A, t A, then «(A,) > u(A)asn > oo. 

(b) If Ai, A2,... E F,An | A, and p(A;) is finite [hence u(A,) is fi- 
nite for all n since u(A1) = w(A,) + u(Ai —A,)], then w(A,) > (A) as 
n—> oo. 


The same results hold if .Y is only assumed to be a field, if we add the 
hypothesis that the limit sets A belong to .¥. [If A ¢.¥ and u > 0, 1.2.5(c) 
implies that (A, ) increases to a limit in part (a), and decreases to a limit in 
part (b), but we cannot identify the limit with u(A).] 


Proor. (a) If u(A,) = œ for some n, then u(A) = u(An)+ u(A —A,) 
= 00+ u (A — Án) = œ. Replacing A by A; we find that w(A;) = œ for all 
k > n, and we are finished. In the same way we eliminate the case in which 
u(n) = —co for some n. Thus we may assume that all (A, ) are finite. 

Since the A, form an increasing sequence, we may use identity (5) of 
Section 1.1: 


A = A; U (A2 — A1) U +- U (Ap —A,_1) U 
Therefore, by 1.2.5(c), 
L(A) = U(A1) + u(A2) — MAI) +++ + (An) UAn) -: 
= lim “(A,). 
fi—> OO 


b) If A, | A, then A; —A, tA; —A, hence (A; — A )—> (A; —A) 
by (a). The result now follows from 1.2.5(c). O 
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We shall frequently encounter situations in which finite additivity of a partic- 
ular set function is easily established, but countable additivity is more difficult. 
It is useful to have the result that finite additivity plus continuity implies 
countable additivity. 


1.2.8 Theorem. Let u be a finitely additive set function on the field .¥. 


(a) Assume that u is continuous from below at each A €., that is, if 
A;,A2,...€ F, A= Le, An E F, and A, +t A, then u(A,) > u(A). It fol- 
lows that yz is countably additive on .¥. 

(b) Assume that yz is continuous from above at the empty set, that is, if 
A;,A2,...,€¥ and A, | Ø, then u(A,,) > O. It follows that u is countably 
additive on .¥. 


Proof. (a) Let A;,A2,...be disjoint sets in. whose union A belongs to 
F. VE B, =|; A; then B, + A, hence u(B,) > (A) by hypothesis. But 
u(B„a) = Xi- H(A;) by finite additivity, hence (A) = limp, S7_, (Ai), 
the desired result. 

(b) Let A ,,A>,...be disjoint sets in.¥ whose union A belongs to .¥, and 
let B, = Ui; Ai. By 1.2.5(c), w(A) = (Ba) + u(A — B,); but A — B, | Ø, 
so by hypothesis, (A — Ba) —> 0. Thus 4(B,) > (A), and the result follows 
as in (a). LJ 


If u; and uz are measures on the o-field 7, then u = u; — u2 is countably 
additive on .¥, assuming either jz; or u3 is finite-valued. We shall see later (in 
2.1.3) that any countably additive set function on a o-field can be expressed 
as the difference of two measures. 

For examples of finitely additive set functions that are not countably addi- 
tive, see Problems 1, 3, and 4. 


Problems 


1. Let Q be a countably infinite set, and let ¥Y consist of all subsets of Q. 
Define «(A) = 0 if A is finite, u (A) = œ if A is infinite. 
(a) Show that u is finitely additive but not countably additive. 
(b) Show that Q is the limit of an increasing sequence of sets A, with 
u(n) = 0 for all n, but w(Q) = œ. 
2. Let u be counting measure on Q, where Q is an infinite set. Show that 
there is a sequence of sets A, | Ø with lim,_,.. w(A,) £0. 


3. Let Q be a countably infinite set, and let ¥ be the field consisting of all 
finite subsets of {2 and their complements. If A is finite, set u(A) = 0, 
and if A‘ is finite, set (A) = 1. 
(a) Show that u 1s finitely additive but not countably additive on .¥. 
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(b) Show that Q is the limit of an increasing sequence of sets A, € ¥ 
with w(A, ) = 0 for all n, but w(Q) = 1. 
4. Let F be the field of finite disjoint unions of right-semiclosed intervals 
of R, and define the set function u on ¥ as follows. 


u(—œ, al = a, ae R, 
ula, b] = b —a, a,beR, a< b, 
LL(b, CO) — —b, bE R, 
u(R)=0, 
uU] => ut) 
i=] i=} 
if J,;,...,7, are disjoint right-semiclosed intervals. 


(a) Show that u is finitely additive but not countably additive on .¥. 
(b) Show that u is finite but unbounded on .¥. 


5. Let u be a nonnegative, finitely additive set function on the field .~. If 
A,,A2,...are disjoint sets in.¥ and |)”, A, €.F, show that 


OO 


u| UA | = SD uan). 
n n=l 


=] 
6. Let f: Q — Q’, and let & be a class of subsets of Q’. Show that 


o(f (7) = f 0P), 
where f—-'(#) ={f! (A): A € #}. (Use the good sets principle.) 
7. IfA is a Borel subset of R, show that the smallest o-field of subsets of 
A containing the sets open in A (in the relative topology inherited from 
R) is {Be A(R): BCA}. 
8. Let A,,...,A, be arbitrary subsets of a set Q. Describe (explicitly) the 
smallest o-field Y containing A;,...,A,. How many sets are there in 
F? (Give an upper bound that is attainable under certain conditions.) 
List all the sets in.“ when n = 2. 
9. (a) Let & be an arbitrary class of subsets of Q, and let X be the col- 
lection of all finite unions ()?_, A;, n = 1, 2,..., where each A; is a 
finite intersection (),_, B;;, with B;; or its complement a set in @. 
Show that & is the minimal field (not o-field) over ¥. 


(b) Show that the minimal field can also be described as the collection 
Z of all finite disjoint unions |)", A;, where the A; are as above. 
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(c) If.4,,...,F, are fields of subsets of Q, show that the smallest field 
including .¥|,...,.¥, consists of all finite (disjoint) unions of sets 
Aj N- 1A, with A; E€ F; i= 1,... A. 
10. Let u be a finite measure on the o-field .¥%. If A, E F, n = 1, 2,...and 
A = lim, Ay, (see Section 1.1), show that %(A) = lim,_... UAn). 


11.* Let & be any class of subsets of Q, with Ø, Qe Z. Define 4p = @, 
and for any ordinal a > 0 write, inductively, 
bu = (Utz: p< ay) , 
where Z” denotes the class of all countable unions of differences of sets 
“a G 
7 Let Y= Vl: a < bı}, where £, is the first uncountable ordinal, 


and let .¥ be the minimal o-field over 4. Since each a C .¥, we have 
Z CF. Also, the y increase with a, and & C %, for all a. 


(a) Show that .~ is a o-field (hence “ = Z by minimality of .¥ ). 


(b) If the cardinality of @ is at most c, the cardinality of the reals, 
show that card .¥ < c also. 


12. Show that if u is a finite measure, there cannot be uncountably many 
disjoint sets A such that (A) > 0. 


1.3 EXTENSION or MEASURES 

In 1.2.4, we discussed the concept of length of a subset of R. The problem 
was to extend the set function given on intervals by u(a, b] = b — a to a larger 
class of sets. If ~ is the field of finite disjoint unions of right-semiclosed 
intervals, there is no problem extending u to .¥o: if A},...,A, are disjoint 
right-semiclosed intervals, we set u (U; Ai) = Xi; (Ai). The resulting set 
function on .¥o is finitely additive, but countable additivity is not clear at this 
point. Even if we can prove countable additivity on sọ, we still have the 
problem of extending u to the minimal o-field over Yo, namely, the Borel sets. 

We are going to consider a generalization of the above problem. Instead of 
working only with length, we shall examine set functions given by u(a, b] 
= F(b) — F(a) where F is an increasing right-continuous function from R 
to R. The extension technique to be developed is not restricted to set func- 
tions defined on subsets of R; we shall prove a general result concerning the 
extension of a measure from a field .7ọ to the minimal o-field over sọ. 

It will be convenient to consider finite measures at first, and nothing is lost 
if we normalize and work with probability measures. 


1.3.1 Lemma. Let o be a field of subsets of a set Q, and let P be a 
probability measure on g. Suppose that the sets A;, A2,... belong to o and 
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increase to a limit A, and that the sets A,’, A’, ... belong to .¥%9 and increase 
to A’. (A and A’ need not belong to .%.) If A C A’, then 


lim P(A,,) < lim P(A,’). 
mM—> OO H —> © 
Thus if A, and A,„’ both increase to the same limit A, then 


lim P(A p) = lim P(A,’). 
H — OO H — 00 
Proor. If m is fixed, Ap MA,’ t Ay, OA’ =A, as n —> œ, hence 
P(Am N An ) > P(Am) 
by 1.2.7(a). But P(A, OD An) < P(A,’) by 1.2.5(c), hence 
P(Am) = lim P(A, NA, ) < lim P(A,’). 
noo Hi— 00 
Let m — œ to finish the proof. LI 


We are now ready for the first extension of P to a larger class of sets. 


1.3.2 Lemma. Let P be a probability measure on the field 49. Let & be the 
collection of all limits of increasing sequences of sets in g, that is, AE & 
iff there are sets A, E€ o, n = 1,2,..., such that A, + A. (Note that & can 
also be described as the collection of all countable unions of sets in .¥o; see 
1.2.1.) 

Define u on ¥ as follows. If A, € o, n =1,2,...,A, tA (E F), set 
u(A) = lim, > P(A, ); u is well defined by 1.3.1, and u = P on sp. Then: 

(a) BEY and w(O)=0; Qe & and w(Q)=1; 0 < u(A)< 1 for all 
AE &. 

b) If G,, G E€ $, then G; U Gn, G; NG) € ¥ and 


w(G, U Go) + (G1 N G2) = (G1) + u(G2). 


(c) If Gi, Gue & and G; C G», then u(G1) < (G2). 
(d) If G, € ¥,n=1,2,..., and G, ¢ G, 
then Ge $ and u(G,) > u(G). 


Proor. (a) This is clear since u = P on .%g and P is a probability measure. 


(b) Let Any € Fo, Ani $ Gis Anz © Fo, Anz T G2. We have P(A, UA,2) 
+ P(An, O An2) = P(Ani) + P(Ay2) by 1.2.5(b); let n —> œ to complete the 
argument. 


(c) This follows from 1.3.1. 
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(d) Since G is a countable union of sets in o, Ge &. Now for each 
n we can find sets Apm E Fo, m = 1,2,..., with Apm t G, as m— oo. The 
situation may be represented schematically as follows: 


Ai; An «+: Atm cts TOG, 
Az, Azz © Åm «++ t 
An} An2 J.: Anm s.s t Gy, 


Let Dm = Aim U Åm U+-+UAmm (the Dm form an increasing sequence). 
The key step in the proof is the observation that 


Anm C Dm C Gn for n<m (1) 
and, therefore, 
P(Anm) < P(Dm) < U(Gm) for nom. (2) 


Let m > œ in (1) to obtain G, C UJ; Dm C G; then let n > œ to conclude 
that Dm t G, hence P(D,,) > u(G) by definition of u. Now let m — œ in 
(2) to obtain “(G,,) < iMn» P(Dm) < limn— U (Gm); then let n > œ to 
conclude that lim, U(Gn) = immo P(Dm) = w(G). O 


We now extend u to the class of all subsets of Q; however, the extension 
will not be countably additive on all subsets, but only on a smaller o-field. 
The construction depends on properties (a)—(d) of 1.3.2, and not on the fact 
that 4 was derived from a probability measure on a field. We express this 
explicitly as follows: 


1.3.3 Lemma. Let & be a class of subsets of a set Q, u a nonnegative 
real-valued set function on & such that Y and u satisfy the four conditions 
(a)—(d) of 1.3.2. Define, for each A C Q, 


u*(A) = inf{u(G): GEF, GDA} 


Then: 


(a) p* =u on ¥,0< p*(A) < 1 forallA CQ. 
(b) u*(A U B)+ K*(AN B) < u*(A) + u*(B); in particular, w*(A) 
+ u*(AC) > K* (2) + u* Ø) = w(Q) + uø) = 1 by 1.3.2(a). 
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(c) IfA CB, then u*(A) < u*(B). 
(d) If A, TA, then u*(A„ )— u* (A). 


Proof. (a) This is clear from the definition of u* and from 1.3.2(c). 
(b) If ¢ >0, choose Gi, Gae &, Gi DA, G2 DB, such that u(G,) 
< *(A) + £/2, (G2) < u*(B) + €/2. By 1.3.2(b), 


U*(A) + u*(B)+ £ > u(Gi) + u(G2) = w(G; U G2) + W(G; N G2) 
> u*(A U B) + u*(A AB). 
Since € is arbitrary, the result follows. 
(c) This follows from the definition of u*. 
(d) By (c), u*(A) > lim,-.. u*(Aņ„). If e > 0, for each n we may choose 
G, € ¥, G, D Apn, such that 
(Gn) < K (An) + €2™. 


Now A =U, An C UR, Gn € Z; hence 
OO 
u*(A) < u* | UG, by (c) 
n=l] 
OO 
= u| UG, by (a) 


pml 


n 


k=l 


= lim 4 č a by 1.3.2(d). 


The proof will be accomplished if we prove that 
“(Us < u*(An) +E% 27%, n=1,2,.... 
i=] i=] 


This is true for n = 1, by choice of G,. If it holds for a given n, we apply 
1.3.2(b) to the sets );_., G; and G+; to obtain 
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Now (Ui; Gi) A Gazi D Gn N Gay1 D An MAngi = An, so that the induc- 
tion hypothesis yields 


n+l n 
u| UG | <u*Gnte >) 27 + Anp) + 2 OF? — u" (An) 
i=] 


i=] 


n+l 


< me" (Án) +E ` 2°. O 


i=] 


Our aim in this section is to prove that a o-finite measure on a field .%o has a 
unique extension to the minimal o-field over .%9. In fact an arbitrary measure ju 
on o can be extended to o(¥o), but the extension is not necessarily unique. 
In proving this more general result (see Problem 3), the following concept 
plays a key role. 


1.3.4 Definition. An outer measure on Q is a nonnegative, extended real- 
valued set function à on the class of all subsets of Q, satisfying 


(a) à(Ø) = 0, 

(b) A C B implies à (A) < à (B) (monotonicity), and 

(c) à (UZ, An) < X7; A(An) (countable subadditivity). 

The set function u* of 1.3.3 is an outer measure on Q. Parts 1.3.4(a) and (b) 
follow from 1.3.3(a), 1.3.2(a), and 1.3.3(c), and 1.3.4(c) is proved as follows: 


OO n 
u“ (Ù An) = lim p’ (09 by 1.3.3(d). 
n=l i=] 
< lim ) u*(4:) by 1.3.3(6), 
i=] 


as desired. 
We now identify a o-field on which u* is countably additive: 


1.3.5 Theorem. Under the hypothesis of 1.3.2, with u* defined as in 1.3.3, 
let # ={H CQ: u*(H)+ w*(A) = 1} 

[Æ ={H CQ: u*(H)+ u*(H®) < 1 by 1.3.3(b).] 

Then X is a o-field and u* is a probability measure on #: 


Proof. First note that & c.. For if A, E€ o and A, tGe &, then 
G° CAS, so PAn) + u(G*) < P(An) + P(AS) =1. By 13.3@), u*(G) 
+ w*(G°) < 1. 
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Clearly % is closed under complementation, and Q € # by 1.3.3(a) and 
1.3.2(a). If Hı, H2 C Q, then by 1.3.3(b), 


u*(Hı U H2) + u* (Hi O H2) < u* (H1) + u*(H2) (1) 
and since 
(Hı U H2% = H N H5, (Hı 0H.) = H; U H5, 
we have 
u* (Hi UHF + (Ay O H2¥ < u*(Hņ1)+ (A). | (2) 


If Hi, H € Æ, add (1) and (2); the sum of the left sides is at least 
2 by 1.3.3(b), and the sum of the right sides is 2. Thus the sum of the left 
sides is 2 as well. If a = u*(H; UH2)+ Ww (A, UA, b = u*(H;ı OH32) 
+ u*(H, O H20, then a + b = 2, hence a < 1 orb < 1. Ifa < 1, then a = 1, 
so b= 1 also. Consequently H;i UH € 4 and H, O H3 € 4%. We have 
therefore shown that % is a field. Now equality holds in (1), for if not, 
the sum of the left sides of (1) and (2) would be less than the sum of the right 
sides, a contradiction. Thus u* is finitely additive on # 

To show that 4 is a o-field, let H, E A n=1,2,...,H, 4 H; w*(A) 
+ u*(H°) > 1 by 1.3.3(b). But u* (H) = lim,-... u*(H„) by 1.3.3(d), hence 
for any € > 0, u*(H) < u*(H,,) + € for large n. Since u* (H°) < u* (H€) for 
all n by 1.3.3(c), and H, E€ #%, we have u*(H)+ u*(H‘) < 1+ e€. Since £ is 
arbitrary, H € #, making Æ a o-field. 

Since 4*(H,,) > u*(H), w* is countably additive by 1.2.8(a). LI 


We now have our first extension theorem. 


1.3.6 Theorem. A finite measure on a field 49 can be extended to a measure 
on o(p). 


Proof. Nothing is lost by considering a probability measure. (Replace u by 
p/p (&2) if necessary.) The result then follows from 1.3.1—1.3.5 if we observe 
that so C ¥ C H, hence o(¥o) C Æ. Thus u* restricted to o(¥o) is the 
desired extension. L] 


In fact there is very little difference between o(¥%o) and Æ; if Be Æ, 
then B can be expressed as A UN, where A € o(p) and N is a subset of a 
set M € o(.¥o) with u*(M) = 0. To establish this, we introduce the idea of 
completion of a measure space. 


1.3.7 Definitions. A measure u on a o-field Z is said to be complete iff 
whenever A € F and u(A) = 0 we have B € F for all BCA. 
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In 1.3.5, u* on. is complete, for if B C A € Z, u*(A) = 0, then u*(B) + 
u* (B°) < u* (A) + u* (B°) = u* (B°) < 1; thus B € .% 

The completion of a measure space (Q,. Z, u) is defined as follows. Let 
F, be the class of sets A U N, where A ranges over ¥ and N over all subsets 
of sets of measure 0 in .¥, 

Now .¥,, is a o-field including .¥, for it is clearly closed under countable 
union, and if AUN €.¥,N CM &.¥, u(M) =), then (AUN) = AS NN 
= (A€ M1 M*)U (AE N (NS — M‘)) and AS N (NS — MC) = ASN (M-N) CM, 
so (AUNY E Fn. 

We extend u to., by setting u (A UN) = u (A). This is a valid definition, 
for if Ay UN; = A2 UN? E Fa, we have 


U(A1) = WA; NA2) + (A: — A2) = MAL N A2) 


since A; — A2 C N2. Thus u(A1) < (Az), and by symmetry, u (A1) = u (A2). 
The measure space (Q2,.¥%,,, 2) 1s called the completion of (Q, F, u), and F, 
the completion of .¥ relative to u. 

Note that the completion is in fact complete, for if M C A UN € ¥, where 
Ae ¥, u(A)=0,NCBEF, w(B)=0, then MCAUBE.¥, u(AU B) 
= Q; hence M E F. 


1.3.8 Theorem. In 1.3.6, (Q, Æ, u*) is the completion of (Q, o(o), u*). 


Proof. We must show that # = F,» where F = o(Fo). If A € Æ, by defi- 
nition of u*(A) and pu*ř(A€) we can find sets G,, G,’ € o(p), 
n=1,2,..., with G, CA C G,’ and u*(G,) > u* (A), u* (G0) > u* (A). 
Le G=U™,G,, C =F, G. Then A=GU(A-G), G Eom), 
A-GCG-GéEo(4%), Ww(G — G) < u* (Gi — Ga) > 0, so that u* (G 
— G) = 0. Thus A € ¥,,«. 

Conversely if B € .¥,., then B=AUN,AC¥,N CM EF, w*(M) =0. 
Since Z C. we have A € #, and since (Q, Æ, u*) is complete we have 
N c .#. Thus 8 & 8 O 


To prove the uniqueness of the extension from .%9 to Z, we need the 
following basic result. 


1.3.9 Monotone Class Theorem. Let .%o be a field of subsets of Q, and & 
a class of subsets of Q that is monotone (if A, € @ and A, t+ A or A, 4 A, 
then A € Z). If & D Yo, then & D o(¥o), the minimal o-field over .Fo. 


ProoF. The technique of the proof might be called “boot strapping.” Let 
Y = o(Fo) and let .#% be the smallest monotone class containing all sets of 
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$y. We show that.4H =.¥, in other words, the smallest monotone class and 
the smallest o-field over a field coincide. The proof 1s completed by observing 
that Æ C E. 

Fix A €e æ and let 4 = {B e æ: AQB,ANB and AABE); 
then .4, is a monotone class. In fact.4, =.4; for if A € Yo, then o C 
6 a since Yo is a field, hence 4 C Æa by minimality of .s ; consequent- 
ly Æa = æ. But this shows that for any B € 4 we have ANB, ANBS, 
A‘ OB e Æ for any A € Fo, so that Æ g D Yo. Again by minimality of .4, 
Wb p= Mb, 

Now Æ is a field (for if A, BE Æ = a, then ANB, AN BS, AQB 
E A) and a monotone class that ts also a field is a o-field (see 1.2.1), hence 
46 is a o-field. Thus ¥ C Æ by minimality of 7, and in fact .¥ = së 
because .¥ is a monotone class including so. LI 


We now prove the fundamental extension theorem. 


1.3.10 Carathéodory Extension Theorem. Let u be a measure on the field 
Fg of subsets of Q, and assume that u is o-finite on Yo, so that Q can be 
decomposed as UPL, Ans where A, € p and u(A,) < co for all n. Then u 
has a unique extension to a measure on the minimal o-field .Y over so. 


Proof. Since .%o is a field, the A, may be taken as disjoint [replace A,, 
by AS M--: MAS, MA,, as in formula (3) of 1.1]. Let u,(A) = u (A NA,), 
A €.¥q; then u, is a finite measure on .¥%o9, hence by 1.3.6 it has an extension 
u% to F. As u = 3°, Hn, the set function u* = 5°, u% is an extension of ju, 
and it is a measure on . since the order of summation of any double series 
of nonnegative terms can be reversed. 

Now suppose that A is a measure on Y and A = u on g. Define A, (A) 
= AA N An), AcE F. Then i, is a finite measure on ¥ and A, = Un = UW 
on .%o, and it follows that A, = už on F. For = {A E F: àn (A) = u> (A)} 
is a monotone class (by 1.2.7) that contains all sets of %o, hence Z = Y¥ by 
1.3.9. But then A = 50, A, = >), MW = u*, proving uniqueness. O 


The intuitive idea of constructing a minimal o-field by forming complements 
and countable unions and intersections in all possible ways suggests that if 
Fo is a field and ¥Y = o(¥o), sets in Y can be approximated in some sense 
by sets in o. The following result formalizes this notion. 


1.3.11 Approximation Theorem. Let (Q,.¥, u) be a measure space, and let 
Fo be a field of subsets of Q such that o(%9) = .¥. Assume that u is o-finite 
on .Yo, and let € > 0 be given. If A € ¥ and u(A) < œo, there is a set B € A 
such that u(A A B) < €. 
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Proor. Let & be the class of all countable unions of sets of o. The con- 
clusion of 1.3.11 holds for any A € &, by 1.2.7(a). By 1.3.3, if u is finite and 
A € ¥, A can be approximated arbitrarily closely (in the sense of 1.3.11) by 
a set in ‘<$, and therefore 1.3.11 is proved for finite u. In general, let Q be the 
disjoint union of sets A, € .%o with u(A,) < œœ, and let u, (C) = w(C NA,), 
CEF., 

Then i, is a finite measure on .¥, hence if A € 7, there is a set B, € ¥o 
such that w,(A AB,) < £2”. Since 


lln (A A Bp) = L(A A Ba) O An) 
— (A A (Ba MAn))NAn] = [nA A (B, NA, )), 


and B, MA, €.%o, we may assume that B, C A,. (The observation that B, N 
A, E Fp is the point where we use the hypothesis that u is o-finite on .¥o, 
not merely on .¥.) If C = (J, Ba, then CNA, = Bn, so that 


UnA AC) = uA A C)NA) = uA A Ba) N An) = Un(AAB,), 


hence 


0O N 
w(AAC)=) un (AAC)< e. But |) B-A t C—A as N > 00. 
n=] k= | 


and A — U, B, A-C. If A€.¥ and (A) < œ, it follows from 1.2.7 
that w(A A U, B,) > u(A AC) as N —> œ, hence is less than e€ for large 
enough N. Set B = Ui, B; € Fp. L 


1.3.12 Example. Let Q be the rationals, .ọ the field of finite disjoint unions 
of right-semiclosed intervals (a, b] = {wm E€ Q: a<a@<b},a,b rational 
[counting (a, co) and Q itself as right-semiclosed; see 1.2.2]. Let ¥ = o(p). 
Then: 


(a) . consists of all subsets of Q. 

(b) If (A) is the number of points in A (u is counting measure), then u 
is o-finite on 7 but not on .¥o. 

(c) There are sets A € .¥ of finite measure that cannot be approximated 
by sets in Yo, that is, there is no sequence A, € .%9 with u(A AA,,) > 0. 

(d) IfA = 2p, then à = u on % but not on Z. 


Thus both the approximation theorem and the Carathéodory extension theorem 
fail in this case. 
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Proor. (a) We have {x} = N; (x — (1/n), x], and therefore all singletons 
are in Z. But then all sets are in .¥ since Q is countable. 

(b) Since Q is a countable union of singletons, u is o-finite on .¥, But 
every nonempty set in .Yp has infinite measure, so u is not o-finite on .¥o. 

(c) If A is any finite nonempty subset of Q, then u(A A B) = œ for all 
nonempty B € .%o9, because any nonempty set in .%) must contain infinitely 
many points not in A. 

(d) Since A{x}=2 and u{x}=1,à u on. But AA)= pA) 
= œ, Á € p (except for A = Ø). O 


Problems 


1. Let (Q, 7, u) be a measure space, and let 7, be the completion of F 
relative to u. If A C Q, define: 


uo(A) = sup{u (B): Be F,B CA}, (A) = inf{u(B): B € .F,B DA}. 
If A €. F, show that uo(A) = u°(A) = u (A). Conversely, if uo(A) 
= u? (A) < oo, show that A € .¥,,. 
2. Show that the monotone class theorem (1.3.9) fails if o is not assumed 
to be a field. 


3. This problem deals with the extension of an arbitrary (not necessarily 
o-finite) measure on a field. 


(a) Let A be an outer measure on the set Q (see 1.3.4). We say that the 
set E is A-measurable iff 


AC) = AANBE)4+1ANE‘) for all ACR. 


(The equals sign may be replaced by “>” by subadditivity of à.) If 
~é is the class of all A-measurable sets, show that .s% is a o-field, 
and that if E,, E2, ... are disjoint sets in .4 whose union is E, and 
A C Q, we have 


MANE)= > MANE,). (1) 


In particular, A is a measure on ..<. [Use the definition of A-mea- 
surability to show that .2 is a feld and that (1) holds for finite 
sequences. If F,, Ez, ...are disjoint sets in # and F, = U, E; t 
E, show that 


Va > AAN Fa) HAAN ES) = YO AA NE) HAAN E’), 


i= | 


and then let n > o0.] 
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(b) Let u be a measure on a field .%o of subsets of Q. If A C Q, define 
pu" (A) = af) u(En) AC LJ En. En € a) 


Show that u* is an outer measure on Q and that u* = u on so. 


(c) In (b), if 4 is the class of u*-measurable sets, show that so C A. 
Thus by (a) and (b), 2 may be extended to the minimal o-field over 
Fo. 


(d) In (b), if u is o-finite on Yo, show that (Q, Æ , u*) 1s the completion 
of [Q, o(%o), *]. 


1.4 LEBESGUE-STELTJES MEASURES AND DISTRIBUTION FUNCTIONS 

We are now in a position to construct a large class of measures on the Borel 
sets of R. If F is an increasing, right-continuous function from R to R, we set 
u(a, b] = F(b) — F(a); we then extend u to a finitely additive set function 
on the field .#9(IR) of finite disjoint unions of right-semiclosed intervals. If we 
can show that jz is countably additive on .~9(R), the Carathéodory extension 
theorem extends u to .#(R). 


1.4.1 Definitions. A Lebesgue—Stieltjes measure on R is a measure u on 
8 (IR) such that (7) < co for each bounded interval 7. A distribution function 
on R is a map F: R— R that is increasing [a < b implies F(a) < F(b)] 
and right-continuous [lim,_, .+ F(x) = F(xo)|. We are going to show that the 
formula u(a, b] = F(b) — F(a) sets up a one-to-one correspondence between 
Lebesgue—Stieltjes measures and distribution functions, where two distribution 
functions that differ by a constant are identified. 


1.4.2 Theorem. Let u be a Lebesgue—Stieltjes measure on R. Let 
F:R — R be defined, up to an additive constant, by f(b) — F(a) = u(a, b]. 
[For example, fix F(0) arbitrarily and set F(x) — F(O) = u(0, x], x > 0; 
F(O) — F(x) = p(x, 0], x < 0.] Then F is a distribution function. 


Proor. If a< b, then F(b) — F(a) = u(a, b] > 0. If {x,} is a sequence of 
points such that x; > x2 >--:— x, then F(x,) — F(x) = u(x, x] > 0 by 
1.2.7(b). U 


Now let F be a distribution function on R. It will be convenient to 
work in the compact space R, so we extend F to a map of R into 
R by defining F(co) = lim, >æ F(x), F(—c) = lim,- F(x); the limits 


exist by monotonicity. Define (a, b] = F(b) — F(a), a,b € R,a < b, and 
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let u[—oo, b] = F(b) — F(—co) = uw (—o~, b]; then u is defined on all right- 
semiclosed intervals of R (counting [—oo, b] as right-semiclosed; see 1.2.2). 

If 7,,...,J, are disjoint right-semiclosed intervals of R, we define 
wU I p= ar w(/;). Thus u is extended to the field F(R) of finite 
disjoint unions of right-semiclosed intervals of R, and u is finitely additive 
on .¥ (IR). To show that u is in fact countably additive on ¥%(R), we make 
use of 1.2.8(b), as follows. 


1.4.3 Lemma. The set function u is countably additive on .¥(R). 


Proof. First assume that F(co) — F(—oo) < oo, so that u is finite. Let 
A}, A2,... be a sequence of sets in .% (IR) decreasing to Ø. If (a, b] is one 
of the intervals of An, then by right continuity of F, w(a’,b] = F(b)— F(a’) 
— F(b) — F(a) = u(a, b] as a’ — a from above. 

Thus we can find sets B, € %o(IR) whose closures B, (in R) are included 
in A,, with u(Bn) approximating u (Ap). If € > 0 is given, the fin gones of u 
allows us to choose the B, so that w(An) — u(B,) < 627". Now NPL = Ø, 
and it follows that (\;_, Bk = Ø for sufficiently large n. (Perhaps te. easiest 
way to see this is to note that the sets R — B, form an open covering of the 
compact set R, hence there is a finite subcovering, so that J?_,(R— B) = R 
for some n. Therefore (),_, By = Ø.) Now 


(An) = “| An —() B +u (Aa) 


k= | k=l 


n 


= u | An — (| B: 
k= | 
< u | |] Ak — Bx) since = A, C Ar- C CA] 
k=] 


<S u(Ak— B) by 1.2.5(d) 
k=l 


<E. 


Thus (A, ) > 0. 

Nowif k (co) — F(—co) = œ, define F, (x) = F(x), |x| < n; Fa = F(n), 
x > n; Fœ) = F(—n), x < —n. If u, is the set function corresponding to 
Fn, then un < u and un > u on ¥o(R). Let Aj, A2, ... be disjoint sets in 
FoR) such that A=U*,A, E FoR) Then u(A)> EZ, u(An) 
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(Problem 5, Section 1.2) soif YX? , u(An) = œ, we are finished. If Y>? , (An) 
< oo, then 


H(A) = lim pnlA} 
OO 
= lim $ Mn (Ar) 
k=] 
since the un are finite. Now as X`}, u (Az) < oo, we may write 


O0<pA)—S u(r) 
k= | 


OO 


= Jim S [un (Ax) — u(Ar)] 
k=} 


<0 since bn <p. U 


We now complete the construction of Lebesgue—Stieltjes measures. 


1.4.4 Theorem. Let F be a distribution function on R, and let (a, b] 
= F(b)— F(a),a < b. There is a unique extension of u to a Lebesgue — Stieltjes 
measure on R. 


Proor. Extend u to a countably additive set function on (R) as above. 
Let (R) be the field of all finite disjoint unions of right-semiclosed inter- 
vals of R [counting (a, oo) as nght-semiclosed; see 1.2.2], and extend u to 
£o(R) as in the discussion that follows 1.4.2. [Take (a, co) = F(co) — F(a); 
u(—oo, b] = F(b)— F(—o), a,b e R; a(R) = F(co) — F(—co); note that 
there is no other possible choice for u on these sets, by 1.2.7(a).] Now the map 


(a,b|]— (a,b). if a,beR orif bER, a=-o, 


(a, œ] > (a,co) if aeR or if a=-—oo 


sets up a one-to-one, u-preserving correspondence between a subset of .%(R) 
(everything in .¥% (IR) except sets including intervals of the form [—co, b]) 
and .#9(R). It follows that u is countably additive on .~9(IR). Furthermore, 
u is o-finite on (R) since u({—n, n] < œ; note that u need not be o- 
finite on .% (IR) since the sets (—n, n] do not cover R. By the Carathéodory 
extension theorem, u has a unique extension to (R). The extension is 
a Lebesgue—Stieltjes measure because u(a, b] = F(b) — F(a) < œ for a,b 
E R,a<b. U 
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1.4.5 Comments and Examples. If F is a distribution function and u the cor- 
responding Lebesgue—Stieltjes measure, we have seen that (a, b| 
= F(b) — F(a), a < b. The measure of any interval, right-semiclosed or not, 
may be expressed in terms of F. For if F(x”) denotes lim,_,,- F(y), then 
(1) ula, b] = F(b) — F(a), (3) pla, b] = F(b)— F(a), 
(2) u(a,b)= FO )- F(a) (4) ula, b)= F(b )— F(a). 
(Thus if F is continuous at a and b, all four expressions are equal.) For 
example, to prove (2), observe that 


u(a, b) = üm (a b — J = lim r( — "| — Fia) = F(b )— F(a). 
Statement (3) follows because 

, 1 l _ 
ula, b] = lim, u(a -7> b = lim FO) — F( — =) = F(b) — F(a ), 


(4) is proved similarly. The proof of (3) works even if a = b, so that 
u{x} = F(x)— F(x). Thus 

(5) F is continuous at x iff {x} = 0; the magnitude of a discontinuity of 
F at x coincides with the measure of {x}. 

The following formulas are obtained from (1)—(3) by allowing a to approach 
—oo or b to approach +œ. 


(6) p(—co,x] = F(x) — F(—o0), (9) pl[x,co)= F(oo)— F(x ), 
(7) p(—oo,x)= F(x )—F(-oo), (10) LR) = Foo) — F(—oo). 
(8) px, co) = Floo)— F(x), 


(The formulas (6), (8), and (10) have already been observed in the proof of 
1.4.4.) 

If u is finite, then F is bounded; since F may always be adjusted by an 
additive constant, nothing is lost in this case if we set F(—co) = 0 

We may now generate a large number of measures on .#(R). For example, 
if f. R — R, f > 0, and f is integrable (Riemann for now) on any finite 
interval, then if we fix F(O) arbitrarily and define 


\ F(x) — F0) = f i fdt,  x>0; 
0 


0 
F(0)— F(x) = f fdt, x < 0, 
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then F is a (continuous) distribution function and thus gives rise to a 
Lebesgue-Stieltjes measure; specifically, 


b 
ua, b] = f fx)dx. 


In particular, we may take f(x)= 1 for all x, and F(x) = x; then x(a, b] 
= b — a. The set function p is called the Lebesgue measure on .#(R). The 
completion of .4(R) relative to Lebesgue measure is called the class of 
Lebesgue measurable sets, written . (R). Thus a Lebesgue measurable set 
is the union of a Borel set and a subset of a Borel set of Lebesgue measure 
0. The extension of Lebesgue measure to .4 (R) is also called “Lebesgue 
measure.” 

Now let u be a Lebesgue—Stieltjes measure that is concentrated on a 
countable set $ = {x;, x2,...}, that is, u(R — S) = 0. [In general if (Q, Z, u) 
is a measure space and Be J, we say that u is concentrated on B iff 
u(Q — B) = 0.] In the present case, such a measure is easily constructed: 
If a), az, ...are nonnegative numbers and A C R, set u (A) = 5 {a;: x; € A}; 
u is a measure on all subsets of R, not merely on the Borel sets (see 1.2.4). If 
u(I) < co for each bounded interval 7, will be a Lebesgue-—Stieltjes mea- 
sure on .4(R); if $0, a; < co, u will be a finite measure. The distribution 
function F corresponding to u is continuous on R — S$; if u{x,} =a, > 0, F 
has a jump at x, of magnitude a,,. If x, y E S and no point of S lies between 
x and y, then F is constant on [x, y). For if x < b < y, then F(b) — F(x) = 
u(x, bl = 0. 

Now if we take S to be the rational numbers, the above discussion yields a 
monotone function F from R to R that is continuous at each irrational point 
and discontinuous at each rational point. 

If F is an increasing, nght-continuous, real-valued function defined on a 
closed bounded interval [a, b], there is a corresponding finite measure u on 
the Borel subsets of [a, b], explicitly, 4 is determined by the requirement 
that u(a', b'] = F(b’) — F(a’), a <a’ <b <b. The easiest way to estab- 
lish the correspondence is to extend F by defining F(x) = F(b), x > b; F(x) 
= F(a), x < a, then take u as the Lebesgue- ‘Stieltjes measure corresponding 
to F, restricted to Hia, bl. 

We are going to consider Lebesgue—Stieltjes measures and distribution 
functions in Euclidean n-space. First, some terminology is required. 


1.4.6 Definitions and Comments. If a= (a),...,a,),0= (b;,..., bn) 
€ R”, the interval (a,b] is defined as {x = (y,...,x%,) ER": a; <x; 
<b; for all i=1,...,n};(a,0o) is defined as {x € R”: x; > a; for all 


i= 1,...,n}, (—oo, b] as {x € R”: x; < b; for alli = 1,..., n}, other types 


1.4 LEBESGUE-STIELTJES MEASURES AND DISTRIBUTION FUNCTIONS 27 


of intervals are defined similarly. The smallest o-field containing all inter- 
vals (a, b], a, b € R”, is called the class of Borel sets of R”, written .4(R” ). 
The Borel sets form the minimal o-field over many other classes of sets, for 
example, the open sets, the intervals [a, b), and so on, exactly as in the dis- 
cussion of the one-dimensional case in 1.2.4. The class of Borel sets of R, 
written .#(R’ ), is defined similarly. 

A Lebesgue—Stieltjes measure on R” is a measure u on .4(R”) such that 
u(i) < co for each bounded interval /. 

The notion of a distribution function on R”, n > 2, 1s more complicated than 
in the one-dimensional case. To see why, assume for simplicity that n = 3, 
and let u be a finite measure on .4(R°). Define 


F(x,, X2,%3) = Uwe R°: WM, ZX, @2<X2, Wy < Xz}, (x1, X2, X3) € R’. 


By analogy with the one-dimensional case, we expect that F is a distribution 
function corresponding to u [see formula (6) of 1.4.5]. This will turn out 
to be correct, but the correspondence is no longer by means of the formula 
u(a, b] = F(b) — F(a). To see this, we compute u(a, b] in terms of F. 
Introduce the difference operator /\ as follows: 
If G: R > R, Ô ba 2 ...,X,) 18 defined as 


G(X, ae g Xp 1, bi, Xj+1, n .,Xn) — G(x), 22+ Xj], Aj, Xj+}1, - ..,Xn): 


1.4.7 Lemma. If a < b, that is, a; < b;, i = 1, 2, 3, then 
(a) ula, b= Npa Ana A pat (x1, X2, X3), where 
(b) Å ba, Å na A ba, F 12 42 x3) 
= F(b,, bo, b3) — F(a, b2, b3) — F(b, a2, b3) — F(b, b2, a3) 
+ F(a, a2, b3) + F(a, b2, a3) + F(b, a2, a3) — Flay, a2, a3) 


Thus u(a, b] is not simply F(b) — F(a). 
PROCF. 
(a) A pa, FOL X2, x3) = Ft, x2, bs) — F(x, x2, 43) 
= MO! @ SX, @2 5%, w3 < b3) 


\ — u{o: @ <x}, 2 <%, 3 <a} 
= uw: @ Sx, @O<%, a3 < @; < bz} 


since az, < b}. 
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Similarly, 

Å ras Å bash Ot» X2, X3) — Ho: @, <x}, <a. <b, a3 < œ <b} 
and 


= . -< < 
Aba L\ tyes Å pa, £ 1s X2, x3) u{w: a) <w; <b), a2 <: <b, 


a3 < 03 < bz}. 
(b) AN, Fa, x2 x3) = F, x2, b3)— F(x, x2, a3), 
b3a3 


22 303 


Thus A bay A boas A wat (x1, X2, x3) is the desired expression. LI 
The extension of 1.4.7 to n dimensions 1s clear. 


1.4.8 Theorem. Let u be a finite measure on .7(R” ) and define 


F(x) = u(—œ0, x] = uw: w; < x, i= 1,..., n}. 


If a < b, then 
(a) u(a, b] = Apa L\ pa, EO- Xn)s where 
O Aya Apa FOD Xa) = Fo- Fit F+ (CVF 


F; is the sum of all (2) terms of the form Fi(c;,-...,c,) with c} = a; for 
exactly 1 integers in {1, 2,..., n}, and c} = b; for the remaining n — i integers. 


Proof. Apply the computations of 1.4.7. LI 


We know that a distribution function of R determines a corresponding 
Lebesgue- Stieltjes measure. This is true in n dimensions if we change the 
definition of increasing. 

Let F: R” — R, and, for a < b, let F(a, b] denote 


JA\ nia Ba A ra, E C -+s Xn). 


The function F is said to be increasing iff F(a, b] > 0 whenever a < b; 
F is right-continuous iff it is right-continuous in all variables together, 
in other words, for any sequence x! > x? >... >x% >...—x we have 
F(x*) > F(x). 
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An increasing right-continuous F: R” —> R is said to be a distribution 
function on R”. (Note that if F arises from a measure u as in 1.4.8, F is a 
distribution function. ) 

If F is a distribution function on R”, we set (a, b] = F(a, b] [this reduces 
to F(b) — F(a) if n = 1]. We are going to show that u has a unique extension 
to a Lebesgue—Stieltjes measure on R”. The technique of the proof is the 
same in any dimension, but to avoid cumbersome notation and to capture 
the essential ideas, we sometimes specialize to the case n = 2. We break the 
argument into several steps: 


(1) Ifa<ad <b’ <b,] = (a,b] is the union of the nine disjoint inter- 
vals 1;,..., Io formed by first constraining the first coordinate in one of the 
following three ways: 


f f f f 
a<x= a, ay <x <b, by <x <b, 


and then constraining the second coordinate in one of the following three 
ways: 
a<y<ay, ay < y < by’, by < y < bz. 


For example, a typical set in the union 1s 
{(@, y): b <x<b, = a<yXay}, 


in n dimensions we would obtain 3” such sets. 
Result (1) may be verified by looking at Fig. 1.4.1. 


(2) In (1), FU) = 5>, F(/j), hence a <a <b’ <b implies Fa’, b'l 
< F(a, b). 

This is verified by brute force, using 1.4.8. 

Now a right-semiclosed interval (a, b] in R is, by convention, a set of 
the form {(x],..., Xn) a; < x; < bi, i =1,...,n}, a,b € R , with the provi- 
so that a; < x; < b; can be replaced by a; < x; < b; if a; = —oo. With this 
assumption, the set ¥9(R’) of finite disjoint unions of right-semiclosed in- 
tervals is a field. (The corresponding convention in R” is that a; < x; < bi 
can be replaced by a; < x; < b; if b; = +00. Both conventions are dictated 
by considerations similar to those of the one-dimensional case; see 1.2.2.) 


(3) Ifa and b belong to R but not to R”, we define F (a, b] as the limit 
of F(a’, b’] where a’, b’ € R”,a’ decreases to a, and b’ increases to b. [The 
definition is sensible because of the monotonicity property in (2).] Similarly 
ifaeR{,be R —R’, we take F(a, b) = lim,,, F(a, b']; if ae R” — R”, 
be R”, F(a, b] = limy a F(@’, b]. 

Thus we define u on right-semiclosed intervals of R; u extends to a finitely 
additive set function on .#)(R’), as in the discussion after 1.4.2. [There is a 
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by 


a+ 


A> 


Figure 1.4.1. 


slight problem here; a given interval / may be expressible as a finite disjoint 
union of intervals 7,,...,7,, so that for the extension to be well defined we 
must have F(/) = Ht F(I;); but this follows just as in (2).] 


(4) The set function u is countably additive on .¥o(R’ ). 

First assume that uR”) is finite. If a € R”, F(a’, b] — F(a, b] as a’ decreas- 
es to a by the right-continuity of F and 1.4.8(b); ifa € R — R”, the same result 
holds by (3). The argument then proceeds word for word as in 1.4.3. 

Now assume uR) = œ. Then F, restricted to C} = {x: —k < x; < k, 
I = 1,..., n}, induces a finite-valued set function ug on F(R ) that is con- 
centrated on Cz, so that 4;,(B) = (B N Cr), Be PR” ). Since ug < u and 
Hk > uon PoR ), the proof of 1.4.3 applies verbatim. 


1.4.9 Theorem. Let F be a distribution function on R”, and let 
u(a, b] = F(a, b], a, b € R",a < b. There is a unique extension of u to a 
Lebesgue- Stieltjes measure on R”. 


ProoF. Repeat the proof of 1.4.4, with appropriate notational changes. For 
example, in extending u to ¥o(R” ), the field of finite disjoint unions of right- 
semiclosed intervals of R”, we take (say for n = 3) 


U(x, y, z): a <x< bj, a@<y<, a3<z<oo}= lim F(a, bl]. 
OO 


2,b3—> 
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The one-to-one j4-preserving correspondence is given by 


(a,b]—> (a,b] if a,b € R” 
or if b € R” and at least one component of 
ais —0O; 
also, if the interval {(x],..., Xn): a4; <x; < b; 1=1,...,n} has some 


b; = oo, the corresponding interval in R” has a; < x; < co. The remainder 
of the proof is as before. LJ 


1.4.10 Examples. (a) Let F,, Fo,..., Fn be distribution functions on R, 
and define F(x,,...,x,) = Fy (x,)F2()--: Fn(x,). Then F is a distribution 
function on R” since 


F(a, b] = | [[F: b) — Fi@)l. 
i=] 


In particular, if Fy(x;) = x;,i = 1,..., n, then each F; corresponds to Lebes- 
gue measure on # (R). In this case we have F(x, ..., Xn) = X1X2 -Xn and 
(a, b] = F(a, b] = | [;-, (bi — ai). Thus the measure of any rectangular box is 
its volume; u is called Lebesgue measure on 2 (R"). Just as in one dimension, 
the completion of .7(R”) relative to Lebesgue measure is called the class of 
Lebesgue measurable sets in R", written #(R" ). 

(b) Let f be a nonnegative function from R” to R such that 


OO OO 
f ej f(X1,...-,Xn)dx1 +--+ dxy < ©. 
OO —00 


(For now, we assume the integration is in the Riemann sense.) Define 


F(x) = f Sede 
(—00,x 


that 1s, 
Færd f of fiti, ...,t,) dt, +++ dtp. 
—- 00 


X] Xy—] bn 
Asa Fnord f ej f T(t, ...,ty)dt,--- dtp, 
— —co Ja, 
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and we find by repeating this computation that 


bi bp 
F(ab] = | af F(t, ...5tn) dty +++ dtn. 
ay dn 


Thus F is a distribution function. If u is the Lebesgue- Stieltjes measure 
determined by F, we have 


u(a, b] = Fx)dx. 
(a,b) 


We have seen that if F is a distribution function on R”, there is a unique 
Lebesgue—Stieltjes measure determined by (a, b] = F(a, b],a < b. Also, 
if u is a finite measure on .4(R”) and F(x) = u(—oo, x], x € R”, then F 
is a distribution function on R” and (a, b] = F(a, b],a < b. It is possi- 
ble to associate a distribution function with an arbitrary Lebesgue- ‘Stieltjes 
measure on R”, and thus establish a one-to-one correspondence between 
Lebesgue—Stieltjes measures and distribution functions, provided distribution 
functions with the same increments F(a, b], a,b € R",a < b, are identified. 
The result will not be needed, and the details are quite tedious and will be 
omitted. 

The following result shows that under appropriate conditions, a Borel set 
can be approximated from below by a compact set, and from above by an 
open set. 


1.4.11 Theorem. If u is a o-finite measure on .#(IR”), then for each 
B e #(R"), 


(a) u(B)= sup{u(K): K C B, K compact}. 

If 2 is in fact a Lebesgue- Stieltjes measure, then 

(b) u(B)=inf{u(V): V DB,B open}. 

(c) There is an example of a o-finite measure on .#(R") that is not a 
Lebesgue—Stieltjes measure and for which (b) fails. 


PROOF. 

(a) First assume that u is finite. Let ¥ be the class of subsets of R” having 
the desired property; we show that @ is a monotone class. Indeed, 
let Ba E Z, Ban + B. Let K, be a compact subset of B, with w(B,) 
< u(Kn)+ £, € > 0 preassigned. By replacing K, by Ui, Ki, we may 
assume the K,, form an increasing sequence. Then 44(B) = limy+o “(Bn ) 
< limp +20 U(K,) + £, so that 


u(B) = sup{u(K). K a compact subset of B}, 
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(b) 


(c) 


and Be @. If B, € Z, Ba | B, let K, be a compact subset of B, such 
that u(Bn) < u(Kyn) +62", and set K = F2, Kn. Then 


u(B) — u(K)=u(B - K) < u | |] Bu — Kn) |< > MB — Kn) < e; 


n=] n=] 


thus B € &. Therefore “ is a monotone class containing all finite disjoint 
unions of right-semiclosed intervals (a right-semiclosed interval is the 
limit of an increasing sequence of compact intervals). Hence % contains 
all Borel sets. 

If u is o-finite, each B € #7 (IR” ) is the limit of an increasing sequence 
of sets B; of finite measure. Each B; can be approximated from within 
by compact sets [apply the previous argument to the measure given by 
ui(A) = u(A N B;), A € 4(R")], and the preceding argument that 7 is 
closed under limits of increasing sequences shows that B € 7. 

We have u(B) < inf{u(V): V DB, V open} 
<inf{u(W): WDB, W=K‘,_ K compact}. 
If u is finite, this equals 4(B) by (a) applied to B°, and the result follows. 

Now assume 4 is an arbitrary Lebesgue- Stieltjes measure, and write 
R” = (J, By, where the B, are disjoint bounded sets; then B} C C} 
for some bounded open set C. The measure 4; (A) = w(ANC;,), 
A €.4(R"”), is finite; hence if B is a Borel subset of B; and € > 0, 
there is an open set W; D B such that u,(Wx) < u(B) + €2-*. Now 
W: Cy, is an open set V; and BN Cr = B since B C By C Cx; hence 
u(V) < u(B)+62-*. For any A€.4(R"), let Vz be an open set 
with Vi; D ANB, and (V) < u(A N Bk) + 2%. Then V = (L, Vk 
is open, V D A, and u(V) < Y g&; u(Vk) < MA) + E. 

Construct a measure u on (R) as follows. Let u be concentrated 
on $ = {1/n: n=1,2,...} and take u{1/n} =1/n for all n. Since 
R = (U {1/7} U S° and u(S°) = 0, u is o-finite. Since 


œ 1 
u[0,1]1= 9,7 = 00, 
n=] 


u is not a Lebesgue—Stieltjes measure. Now {0} = 0, but if V is an 
open set containing 0, we have 


u(V) > u(—e, €) for some € 


\ > 5 > for some r 
k=r 


= 00. 
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Thus (b) fails. [Another example: Let (A) be the number of rational 
points in A.] 


Problems 


I. 


Let F be the distribution function on R given by F(x) = 0, x < —1; 
F(x) =14+x, -1<x<0; F@®)=242%'7,0<x<2; F(x) =9,x> 
2. If u is the Lebesgue- Stieltjes measure corresponding to F, compute 
the measure of each of the following sets: 


(a) {2}, d) [0,4) ua, 21, 
(b) I—s, 3), (e) {x: |x| + 2x? > 1}. 
(c) (—1, 0] U (1, 2), 


Let u be a Lebesgue- Stieltjes measure on R corresponding to a con- 
tinuous distribution function. 


(a) IfA is a countable subset of R, show that u (A) = 0. 

(b) If u(A) > 0, must A include an open interval? 

(c) If u(A) > 0 and u(R — A) = 0, must A be dense in R? 

(d) Do the answers to (b) or (c) change if u is restricted to be Lebesgue 
measure? 


If B is a Borel set in R” and a € R”, show that a + B = {a + x: x e B} 
is a Borel set, and —B = {—x: x € B} is a Borel set. (Use the good sets 
principle.) 


Show that if B € B (R"),a € R”, then a +B € .¥ (R") and u(a + B) 
= u(B), where u is Lebesgue measure. Thus Lebesgue measure is 
translation-invariant. (The good sets principle works here also, in con- 
junction wıth the monotone class theorem.) 


Let u be a Lebesgue- Stieltjes measure on .#(R") such that u(a +1) 
= u(l) for all a € R” and all (right-semiclosed) intervals J in R”. In 
other words, u is translation-invariant on intervals. Show that u is a 
constant times Lebesgue measure. 


(A set that is not Lebesgue measurable) Call two real numbers x and 
y equivalent iff x — y is rational. Choose a member of each distinct 
equivalence class B, = {y: y— x rational} to form a set A (this requires 
the axiom of choice). Assume that the representatives are chosen so that 
A C [0, 1]. Establish the following: 


(a) If r and s are distinct rational numbers, (r +A)/M (s +A) = Ø; 
also R = (J{r + A: r rational}. 

(b) IfA is Lebesgue measurable (so that r + A is Lebesgue measurable 
by Problem 4), then u(r + A) = 0 for all rational r (u is Lebesgue 
measure). Conclude that A cannot be Lebesgue measurable. 
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10. 


*11. 


1.5 


The only properties of Lebesgue measure needed in this problem are 
translation-invariance and finiteness on bounded intervals. Therefore, 
the result implies that there is no translation-invariant measure A (except 
à = 0) on the class of all subsets of R such that A(/) < co for each 
bounded interval 7. 

(The Cantor ternary set) Let E, be the middle third of the interval [0, 1], 
that is, E; = (4, =); thus x € [0, 1] — E; iff x can be written in ternary 
form using O or 2 in the first digit. Let E> be the union of the middle 
thirds of the two intervals that remain after E, is removed, that is, E> 
= (4,2) U (4, 5); thus x € [0, 1] — (E1 U E2) iff x can be written in 
ternary form using 0 or 2 in the first two digits. Continue the construc- 
tion; let E, be the union of the middle thirds of the intervals that remain 
after E,,...,£,—,; are removed. The Cantor ternary set C is defined 
as [0, 1] — UPL; En; thus x € C iff x can be expressed in ternary form 
using only digits 0 and 2. Various topological properties of C follow 
from the definition: C is closed, perfect (every point of C is a limit 
point of C), and nowhere dense. 


Show that C is uncountable and has Lebesgue measure 0. 


Comment. In the above construction, we have m(E,,) = (5) Gi , 
n=1,2,..., where m is Lebesgue measure. If 0<a <1, the 
procedure may be altered slightly so that m(En) = a(t)". We then 
obtain a set C(a), homeomorphic to C, of measure 1 — œ; such sets are 


called Cantor sets of positive measure. 


Give an example of a function F: R? > R such that F is right-conti- 
nuous and is increasing in each coordinate separately, but F is not a 
distribution function on R?. 

A distribution function on R is monotone and thus has only countably 

many points of discontinuity. Is this also true for a distribution function 

on R”, n > 1? 

(a) Let F and G be distribution functions on R”. If F(a, b] = G(a, b] 
for all a,b € R", a < b, does it follow that F and G differ by a 
constant? 

(b) Must a distribution function on R” be increasing in each coordinate 
separately’? 


If c is the cardinality of the reals, show that there are only c Borel 
subsets of R”, but 2° Lebesgue measurable sets. 


MEASURABLE FUNCTIONS AND INTEGRATION 


If f is a real-valued function defined on a bounded interval [a, b] of reals, 
we can talk about the Riemann integral of f, at least if f is piecewise contin- 
uous. We are going to develop a much more general integration process, one 
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that applies to functions from an arbitrary set to the extended reals, provided 
that certain “measurability” conditions are satisfied. 

Probability considerations may again be used to motivate the concept of 
measurability. Suppose that (Q,.¥%, P) is a probability space, and that h is a 
function from Q to R. Thus if the outcome of the experiment corresponds 
to the point w € Q, we may compute the number A(w). Suppose that we are 
interested in the probability that a < h(w) < b, in other words, we wish to 
compute P{w: h(w) € B} where B = [a, b]. For this to be possible, the set 
{w hlæ) € B} = h! (B) must belong to the o-field ¥. If h-'(B) E€ F for 
each interval B (and hence, as we shall see below, for each Borel set B), then 
h is a “measurable function,” in other words, probabilities of events involving 
h can be computed. In the language of probability theory, h is a “random 
variable.” 


1.5.1 Definitions and Comments. If h: Q; —> Q2, his measurable relative 
to the o-fields .7; of subsets of Q;, j = 1, 2, iff h`! (A) € F; for each A € F3. 


It is sufficient that h7! (A) € F; for each A € %, where ¥ is a class of subsets 
of Q such that the minimal o-field over ¥ is 75. For {A € Fo: h (A) E.F} 
is a o-field that contains all sets of 4, hence coincides with .7>. This is another 
application of the good sets principle. 

The notation h: (Q1, F1) > (Q2, 9>) will mean that h: Q, > Qz, mea- 
surable relative to F, and 1. 

If ¥ is a o-field of subsets of 2, (Q, Z ) is sometimes called a measurable 
space, and the sets in .¥ are sometimes called measurable sets. 

Notice that measurability of h does not imply that h(A) €.% for each 
A € F. For example, if 7 = {@, Q2}, then any h: Q; —> Qz is measurable, 
regardless of .¥;, but if A €.¥, and h(A) is a nonempty proper subset of (2, 
then (A) ¢ Fz. Actually, in measure theory, the inverse image is a much 
more desirable object than the direct image since the basic set operations are 
preserved by inverse images but not in general by direct images. Specifical- 
ly, we have h`! (U; B;) = |]; h '(B;i), h’ (N B;) = fM; h~! (B;), and h~! (B°) 
= [A7 !(B)]®. We also have h (U; B;) = U; h(4;), but A (N; A:) c N; hA; ), and 
the inclusion may be proper. Furthermore, h(A‘ ) need not equal [h(A)]°, in fact 
when /: is a constant function the two sets are disjoint. 

If (Q, 7 ) is a measurable space and h: Q — R” (or R ), h is said to be 
Borel measurable [on (Q,.¥ )| iff h is measurable relative to the o-fields Z 


and Æ, the class of Borel sets. If Q is a Borel subset of R* (or R“) and we 
use the term “Borel measurable,” we always assume that F = 7%. 

A continuous map h from R* to R” is Borel measurable; for if Z` is the 
class of open subsets of R”, then h7! (A) is open, hence belongs to .#(R*), 
for each A € %. 
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If A is a subset of R that is not a Borel set (Section 1.4, Problems 6 and 
11) and J, is the indicator of A, that is, [4(@) = 1 for w € A and 0 for w ¢ A, 
then 7, is not Borel measurable; for {w: I4(@) = 1} =A ¢ .A(R). 

To show that a function h:  — R (or R) is Borel measurable, it is sufficient 
to show that {w: h(w) > c}e F for each real c. For if @ is the class of 
sets {x: x > c}h,c E€ R, then o(%) = .2 (R). Similarly, {w: h(@) > c} can be 
replaced by {w: h(@) > c}, {@: hlæ) < c} or {w: hlæ) < c}, or equally well 
by {w:. a < h(w) < b} for all real a and b, and so on. 

If (Q2,.%, u) is a measure space the terminology “h is Borel measurable 
on (&2,.4%, wy’ will mean that h is Borel measurable on (Q,.¥ ) and u is a 
measure on 7. 


1.5.2 Definition. Let (&2,.% ) be a measurable space, fixed throughout the 
discussion. If h: Q — R, his said to be simple iff h is Borel measurable and 
takes on only finitely many distinct values. Equivalently, h is simple iff it can 
be written as a finite sum X`, x;J4, where the A; are disjoint sets in ¥ and 
Ia, is the indicator of A;; the x; need not be distinct. 


We assume the standard arithmetic of R; if a € R, a+ œ = œ, a — œ 
= -—00,a/o =a/-w = 0,a-œ =œ if a>0,a-w=-ocoo if a<0, 
0-œ = 0. (-œ)= 0, Œ% +% = %, -% — % = —~, with commutativity 
of addition and multiplication. It is then easy to check that sums, differences, 
products, and quotients of simple functions are simple, as long as the 
operations are well-defined, in other words we do not try to add +00 and 
—oo, divide by 0, or divide co by cx. 

Let u be a measure on .¥, again fixed throughout the discussion. If h: 
© — R is Borel measurable we are going to define the abstract Lebesgue inte- 
gral of h with respect to u, written as fẹ hdu, fo h(w)u(dw), or fa h(w)du(w). 


1.5.3 Definition of the Integral. First let h be simple, say h = Y`; xila, 
where the A; are disjoint sets in .#. We define 


| nde = Y xua) 
i=] 


as long as +oo and —oo do not both appear in the sum; if they do, we say 
that the integral does not exist. Strictly speaking, it must be verified that if A 
has a different representation, say )7,_, yj/g,, then 


X un) = >> yjuB;). 
i=] j=l 
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(For example, if A= BUC, where BN C = Ø, then xl; = xlg + xlc¢.) The 
proof is based on the observation that 


h= ` Silane, 


i=] j=] 


where z;; = x; = y;. Thus 
> ej HAL NBs) = > xi 9 uA: NB;) 
ij i j 
— X uldi) 
= _yju(B;) by a symmetrical argument. 
J 


If h is nonnegative Borel measurable, define 


f riu=s] f saw S simple, 0<s <h} 
2 Q 


This agrees with the previous definition if A is simple. Furthermore, we may 
if we like restrict s to be finite-valued. 

Notice that according to the definition, the integral of a nonnegative Borel 
measurable function always exists; it may be +o. 

Finally, if h is an arbitrary Borel measurable function, let ht = max(h, 0), 
h` = max(—A, 0), that is, 


ht (w) = hlæ) if h(w) > 0; ht (w) =0 if h(w) < 0; 
h (w) = —h(w) if h(w) < 0; h (w) = 0 if h(w) > 0. 


The function A is called the positive part of h, h” the negative part. 
We have |A| = ht +W, h = ht — h, and h* andh are Borel measurable. 
For example, {w: h*(w) € B} = {w: h(w) > 0, h(w) € B}U {@: hlæ) <0, 
0 € B}. The first set is h7'[0, co] N A~! (B) € F; the second is h~'[—co, 0) 
if O€ B, and Ø if O¢ B. Thus (ht) '(B)e ¥ for each BeE.A(R), 
and similarly for h~. Alternatively, if h; and hz are Borel measurable, then 
max(h,, h>) and min(h,, h2) are Borel measurable; to see this, note that 


{@: max(h; (w), h2(@)) < c} = {wi hy(@) < c} N tw: ha(w) < c} 


and {w: min(h,(@), ha(w) < c} = {@: hi (w) <c} U{@: hz(w) < c}. It fol- 
lows that if h is Borel measurable, so are h* and ho. 
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We define 
| hdu = f h* du -/ h du if this is not of the form +00 —oo; 
Q 2 Q 


if it is, we say that the integral does not exist. The function h is said to be 
-integrable (or simply integrable if u is understood) iff fẹ hdu is finite, 
that is, iff Jẹ ht du and fẹ h~ du are both finite. 


If A € F, we define 
fhdu= | hadu 
A Q 


(The proof that hl, is Borel measurable is similar to the first proof above that 
h* is Borel measurable.) 

If h is a step function from R to R and u is Lebesgue measure, |, Adu 
agrees with the Riemann integral. However, the integral of h with respect to 
Lebesgue measure exists for many functions that are not Riemann integrable, 
as we shall see in 1.7. 

Before examining the properties of the integral, we need to know more about 
Borel measurable functions. One of the basic reasons why such functions are 
useful in analysis is that a pointwise limit of Borel measurable functions is 
still Borel measurable. 


1.5.4 Theorem. If h;, ho,... are Borel measurable functions from Q to R 
and h, (w) — h(w) for all w € Q, then h is Borel measurable. 


ProoF. It is sufficient to show that {w: h(w) > c}e€ 7 for each real c. We 
have 


{w: h(w) > ch = w: lim h,(@) > c} 


l 
— fo h,(@) is eventually > c + - for some r = 1, 2,.. | 
r 


O° I 
— 5 fo h,(@) > c + — for all but finitely many n} 
r 
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To show that the class of Borel measurable functions is closed under alge- 
braic operations, we need the following basic approximation theorem. 


1.5.5 Theorem. (a) A nonnegative Borel measurable function A 1s the limit 
of an increasing sequence of nonnegative, finite-valued, simple functions h,,. 

(b) An arbitrary Borel measurable function f is the limit of a sequence 
of finite-valued simple functions fa, with |f,| < |f| for all n. 


PRooF. (a) Define 


— 1 — 1 k 
if E sho) <5. k=1,2,...,n2", 


Dn Dn — n 


k 
hy (w) — 


and let h,(w)=n if hæ) > n. [Or equally well, h,(@) = (k —1)/2” if 
(k — 1)/2" < hlæ) < k/2",k = 1,2,..., n2"; h,(@) =n if h(w) > n; hz) 
= 0 if h(w) = 0.] The h, have the desired properties (Problem 1). 


(b) Let g, and h, be nonnegative, finite-valued, simple functions with 
En t ft and hn + f; take fp = 2, — hin. LI 


1.5.6 Theorem. If h, and hz are Borel measurable functions from Q to R, 
so are h, + hy, hy — hz, hihz, and hi/hz [assuming these are well-defined, in 
other words, hı (œw) + h2(@) is never of the form +co —o and h,(w)/h2(@) 
is never of the form œ/œ or a/0]. 


Proor. As in 1.5.5, let 5;,,52, be finite-valued simple functions with 
Sin —> hy, S2n —> hp. Then Sin + S2n —> hy + hy, 


SinSin liho (hy 40} > hiha, 


and 
Sin hı 


— "7" ‘y L, 
San + A/M M 
Since 
1 —| 
Sin = Sn, SinS2nl th, 40} (ho 40}; Sin (sa + “Hawt ) 


are simple, the result follows from 1.5.4. LI 


We are going to extend 1.5.4 and part of 1.5.6 to Borel measurable functions 
from Q to R ; to do this, we need the following useful result. 


1.5.7 Lemma. A composition of measurable functions is measurable; spe- 
cifically, if g: (21, 71) > (22,72) and h: (Qo, 42) > (23, 73), then 
hog: (Qi, Fi) > (Q3, ¥3). 
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Proor. If B €.¥3, then (hog) '(B) = 2 '(h'(B)) €.Fi. O 


Since some books contain the statement “A composition of measurable 
functions need not be measurable,” some explanation is called for. If h: 
R — R, some authors call h “measurable” iff the preimage of a Borel set 
is a Lebesgue measurable set. We shall call such a function Lebesgue measur- 
able. Note that every Borel measurable function is Lebesgue measurable, but 
not conversely. (Consider the indicator of a Lebesgue measurable set that is 
not a Borel set; see Section 1.4, Problem 11.) If g and h are Lebesgue mea- 
surable, the composition A » g need not be Lebesgue measurable. Let .# be the 
Borel sets, and .% the Lebesgue measurable sets. If B € Z then h~'(B) € & 
but ¢~'(h7'(B)) is known to belong to B only when h7'(B) € Z, so we 
cannot conclude that (he g)~'(B) € Z. For an explicit example, see Royden 
(1968, p. 70). If g- '(A) € Æ forall A € .& not just for all A € .& then we are 
in the situation described in Lemma 1.5.7, and ho g is Lebesgue measurable; 
similarly, if h is Borel measurable (and g is Lebesgue measurable), then ho g 
is Lebesgue measurable. 

It is rarely necessary to replace Borel measurability of functions from R to R 
(or R* to R”) by the slightly more general concept of Lebesgue measurability; 
in this book, the only instance is in 1.7. The integration theory that we are 
developing works for extended real-valued functions on an arbitrary measure 
space (2, .¥, u). Thus there is no problem in integrating Lebesgue measurable 
functions; Q = R, F = B 

We may now assert that if hi, h2, ... are Borel measurable functions from 
Q to R and h, converges pointwise to h, then h is Borel measurable; fur- 
thermore, if h; and hy are Borel measurable functions from Q to R, so 
are h; + h2 and h; — hy, assuming these are well-defined. The reason is that if 
h(@) = (hi (w), ..., An(@)) describes a map from Q to R, Borel measurability 
of h is equivalent to Borel measurability of all the component functions h;. 


1.5.8 Theorem. Let h. Q > R; if p; is the projection map of R onto R, 
taking (x;,...,X,) to x; set h; = pjoh,i=1,...,n. Then h is Borel mea- 
surable iff h; is Borel measurable for all i = 1,..., n. 


Proof. Assume h Borel measurable. Since 
p; (xi: a; <x <bj={xeR': a4,<x%,<b, —-œ<x <œ, Jj Fi}, 
which is Xmen of R , p; is Borel measurable. Thus 
h: (QF) > R, B(R)), pi R, AR) > R, 2R), 
and therefore by 1.5.7, hi: (Q, ZF ) > (R, 2R). 
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Conversely, assume each h; to be Borel measurable. Then 


h'{x eR’: a; <x; <b, t= 1,...,n} 


= (lv € Q: a; < hw) <b} EF, 


i=] 


and the result follows. O 
We now proceed to some properties of the integral. In the following result, 
all functions are assumed Borel measurable from @ to R. 


1.5.9 Theorem. (a) If f, hdu exists and c € R, then fẹchdu exists and 
equals c fo hd. 

(b) If g(@) > h(@) for all w, then fẹ gdu > fẹhdu in the sense that 
if fẹ hdu exists and is greater than —oo, then fẹ gdu exists and f, gdu 
> fẹ hdu; if fẹ gdu exists and is less than +00, then f,hdu exists and 
Jo hdu < fẹ gdu. Thus if both integrals exist, fẹ gdu > fẹ hd, whether or 
not the integrals are finite. 

(c) If f,hdu exists, then | fo hdu| < fo |h| du. 

(d) Ifh>OandB e ¥, then {,hdu = sup{ fa sdu: 0 <s <h,s simple}. 

(e) If fẹ hdu exists, so does f, hdu for each A € F, if fa hd is finite, 
then f, hd is also finite for each A € F. 


Proof. (a) It is immediate that this holds when A is simple. If h is nonneg- 
ative and c > 0, then 


| chdu = sup p sdu: O<s<ch, s simple} 
2 Q 


= esup 4 f Žau; 0<? <h, = simple} =e f ady 
QC C C G2 


In general, if h = ht —h™ and c > 0, then (ch)t = cht, (chy = ch"; hence 
fochdu=c Jf ,htdu—cf,h du by what we have just proved, so that 
Jochdu=c Jo hdu. If c <0, then 


(ch) = —ch’, (ch) = —ch', 


[cndu=—e fr dute | nt duae | hau. 
Q Q 2 Q 


(b) If g and h are nonnegative and 0 < s < h, s simple, then 0 <5 < 
g; hence fo hdu < fo gdu. In general, h < g implies ht < gt, h > g`. If 


SO 
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fohdu > —œ, we have fa g7 du < fa h` du < œ; hence fo gdu exists and 


equals 
J et du — Je du > | kdu- [re du = f haw 


The case in which fẹ gdu < © is handled similarly. 

(c) We have —|h|<h<|h| so by (a) and (b), —fo|hldu 
< fẹhdu < fo |h| du and the result follows. (Note that |A| is Borel mea- 
surable by 1.5.6 since |A| = ht +h-.) 

(d) IfO0<s <h, then f,sdu < J, hdu by (b); hence 


[rae = sup} f sdu O<s<h} 
B B 


If 0<t¢t< hlg, t simple, then t=tlg <h so fẹtdu < sup{fo slg du: 
0 <s < h,s simple}. Take the sup over £ to obtain f; hdu < sup{ f, sdu: 
0 <s <h, s simple}. 

me This follows from (b) and the fact that (hl,)* =htI, <h', (hI,)~ 


Problems 


1. Show that the functions proposed in the proof of 1.5.5(a) have the desired 
properties. Show also that if h is bounded, the approximating sequence 
converges to h uniformly on Q. 


2. Let f and g be extended real-valued Borel measurable functions on 
(2, F ), and define 


h(w) = f(@) if wéeA, 
= 2(w) if w EA‘, 


where A is a set in .¥. Show that h is Borel measurable. 


3. If fi, fo,... are extended real-valued Borel measurable functions on 
(22,4 ),n =1,2,..., show that sup, fn and inf, f, are Borel 
measurable (hence limsup, ,., fn and liminf,... fn are Borel 
measurable). 


4. Let (Q,.¥, pu) be a complete measure space. If f: (Q9, 4%) > (V, F’ 
and g: \§S2 > Q, g = f except on a subset of a set A € .¥ with u(A) = 0 
show that g is measurable (relative to ¥ and ¥’). 


*5. (a) Let f be a function from R* to R”, not necessarily Borel measur- 
able. Show that {x: f is discontinuous at x} is an F, (a countable 


dd 


*6. 


(a) 
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union of closed subsets of R*), and hence is a Borel set. Does this 
result hold in spaces more general than the Euclidean space R"? 


(b) Show that there is no function from R to R whose discontinuity set 
is the irrationals. (In 1.4.5 we constructed a distribution function 
whose discontinuity set was the rationals.) 


How many Borel measurable functions are there from R” to R*? 


We have seen that a pointwise limit of measurable functions is measur- 
able. We may also show that under certain conditions, a pointwise limit 
of measures is a measure. The following result, known as Steinhaus’ 
lemma, will be needed in the problem: If {a,;} is a double sequence of 
real numbers satisfying 


G) Jian = l for all n, 
Gi) Ða lang| < c < œ for all n, and 
(ili) aną — 0 as n —> œ for all k, 
there is a sequence {x,}, with x, =O or 1 for all n, such that t, 
= Ý? | ankXk fails to converge to a finite or infinite limit. 


To prove this, choose positive integers n; and k; arbitrarily; having cho- 
SEn ;,...,,;,k,,...,k,, choose n, > n, such that S kek |an, k| < t 
this is possible by (iii). Then choose k,,; > k, such that È zyz, länk! 
< 5; this is possible by (11). Set x, = 0, ko.) < k < ka, Q = 1, kz 
<k <ko41, S=1,2,.... We may write ta, as hj + h2 + h3, where 
hı is the sum of an, kX for k < k,, ho corresponds to k, < k <k,,,, and 
hz to k > k,41. If r is odd, then x, = 0, k, < k < k,41; hence |ta, | < 7. 
If r is even, then hp = > k, <k<k,ı nk) hence by (i), 


3 
h =1-— N an k — ` Anik > 7 


k<k, k>Ky4) 


Thus tp, > 3 — |h,| — |3| > Í, so {t,} cannot converge. 


Vitali— Hahn—Saks Theorem. Let (Q, F ) be a measurable space, and let 
P,,n=1,2,..., be probability measures on ¥. If P (A) — P(A) for 
all A € F, then P is a probability measure on ¥; furthermore, if {B;} is a 
sequence of sets in. decreasing to Ø, then sup, P, (B,) | 0 as k > œ. 
[Let A be the disjoint union of sets A; € #; without loss of generality, 
assume A = Q (otherwise add A‘ to both sides). It is immediate that P 
is finitely additive, so by Problem 5, Section 1.2, a = $`, P(Ax) < P(Q) 
= 1. Ifa < 1, set anų = (1 — a)" '[P,, (Ax) — P(A;)] and apply Steinhaus’ 
lemma. | 
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(b) Extend the Vitali-Hahn—Saks theorem to the case where the P,, are 
not necessarily probability measures, but P, (Q2) < c < œ for all n. [For 
further extensions, see Dunford and Schwartz (1958).] 


1.6 Basic INTEGRATION THEOREMS 

We are now ready to present the main properties of the integral. The results 
in this section will be used many times in the text. As above, (Q,.¥, 4) is a 
fixed measure space, and all functions to be considered map Q to R. 


1.6.1 Theorem. Let h be a Borel measurable function such that fẹ hdu 
exists. Define A(B) = fp hdu, B € F. Then A is countably additive on Z; 
thus if h > 0, A is a measure. 


Proof. Let h be a nonnegative simple function 5)", x;J4,. Then A(B) 
= f hdu = $; xiu (B N Ai); since p is countably additive, so is A. 

Now let h be nonnegative Borel measurable, and let B = Jr, Bn, the B, 
disjoint sets in .¥. If s is simple and 0 < s < h, then 


OO 
sdu = f sa 
J u d fp u 


by what we have proved for nonnegative simple functions 


OO 
< > | hdu 
n=l n 


by 1.5.9(b) (or the definition of the integral). 


Take the sup over s to obtain, by 1.5.9(d), A(B) < Y2, A(Bn). 

Now B,, C B, hence Ig, < Ig, so by 1.5.9(b), à (Bn) < ACB). If A(B,,) = œ 
for some n, we are finished, so assume all A(B,,) finite. Fix n and let £ > 0. 
It follows from 1.59(b), (d) and the fact that the maximum of a finite number 
of simple functions is simple that we can find a simple function s, 0 < s < h, 
such that 


f sdiu= | hdu— =, i=1,2,...,n. 
B, B, n 


Now \ 


MB UUB) = |, hdw> | sdu = f sd 
B® JUa 2 fo 
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by what we have proved for nonnegative simple functions, hence 


MB) UUB) Y | hdu — £ = Ñ` A(B;) — €. 
j=] YB: i=] 


Since A(B) > à (Uj, Bi) and e is arbitrary, we have 


M(B) > $ (Bi). 


i=] 


Finally let h = ht — h` be an arbitrary Borel measurable function. Then 
à (B) = fht du — fp h` du. Since [ht du <œ or fẹ h~ du < o, the re- 
sult follows. 


The proof of 1.6.1 shows that à is the difference of two measures X* and 
à`, where A+ (B) = J, ht du, A~ = Jo h- du; at least one of the measures AT 
and A~ must be finite. 


1.6.2 Monotone Convergence Theorem. Let h,, hz, ... form an increasing 
sequence of nonnegative Borel measurable functions, and let h(w) 
= limnoohn(@), we Q. Then fh, du —> f,hdu. [Note that fo hn du 
increases with n by 1.5.9(b); for short, 0<h, th implies Ja h, du 
t Jahdu.] 


Proof. By 1.5.9(b), Jẹ hudu < Jo hd for all n, hence k = limn—oo fo An 
du < fjẹhdu. Let O <b <1, and let s be a nonnegative, finite-valued, 
simple function with s < h. Let B, = {@: hn (w) > bs(w)}. Then Ba t Q since 
h, th and s is finite-valued. Now k > fo hndu > fy hndu by 1.5.9(b), 
and f, hudu >b fy sdu by 1.5.9(a) and (b). By 1.6.1 and 1.2.7, f, sdu 
— f,sdu, hence (let b —> 1) k> fąẹsdu. Take the sup over s to obtain 
k> jẹhdu. O 


1.6.3 Additivity Theorem. Let f and g be Borel measurable, and assume that 


f +g is well-defined. If fẹ f du and fẹ gdu exist and fẹ f du + fo gdu is 
well-defined (not of the form +co —oo or —co +00), then 


[rem | fdu+ | eau 


In particular, if f and g are integrable, so is f + g. 
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Proof. If f and g are nonnegative simple functions, this is immediate from 
the definition of the integral. Assume f and g are nonnegative Borel measur- 
able, and let ¢,, u, be nonnegative simple functions increasing to f and g, 
respectively. Then 0 < Sn =t,+u,t f +g. Now fos,du= fot, du 
+ fa un du by what we have proved for nonnegative simple functions, hence 


by 1.6.2, fo(f +8)du = fo f du + fogdu. 
Now if f> 0, g<0,h=f+g>0 (so g must be finite), we have 


f = h+ (-g); hence fẹ f du = fa hdu — fo gdu. If fẹ gdu is finite, then 
fohdu = fa fdu + fa gdu, and if fa gdu = —oo, then since h > 0, 


| fdz- | edu=o, 
Q2 Q2 


contradicting the hypothesis that fa f du + J. gdu is well-defined. Similarly, 
if f >0, g <0, h<0, we obtain fẹ hdu = fo f du + Jogdu by replac- 
ing all functions by their negatives. (Explicitly, —g > 0, —f <0, —h = -f 
— g > 0, and the above argument applies.) 

Let 


={w f(@)>9, gw) = 0}, 
E = {w f(w) = 0, g(w) < 0, h(w) > 0}, 
E; = {@: f(@) = 0, g(w) < 0, h(w) < O}, 
E, = {æ f(@) <0, g(w) = 0, h(w) = 0}, 
Es ={w: f(w) <0, g(w) > 0, h(w) < 0}, 
Es ={o: f(@) <0, e(w) < 0}. 


The above argument shows that f; hdu = fg fdu + f p, &du. Now fo fdu 
= = Xi- Se, f du, Jog du = Di- Je g du by 1.6.1, so that fo f du + fogdu 
=F, fe, hd, and this equals fohdu by 1.6.1, if we can show that fe hdu 
exists; that is, fa ht du and fa h du are not both infinite. 

If this is the case, fẹ h* du = Je, h` du = œ for some i, j (1.6.1 again), 
so that fẹ hdu =œ, J, hdu =—oo. But then fẹ, fdu or J, gdu = 00; 
hence fy fdu or fẹ gdu = œ. (Note that fa ft du > Je, fT du.) Similarly 
Ja f du or fẹ gdu = —œ, and this is a contradiction. O 
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1.6.4 Corollaries. (a) If h,, hz, ... are nonnegative Borel measurable, 


OO OO 
| Shy du = >> | mdy. 
02 n=l} n= | 2 


Thus any series of nonnegative Borel measurable functions may be integrated 
term by term. 

(b) If h is Borel measurable, h is integrable iff |A| is integrable. 

(c) If g and h are Borel measurable with |g| < h, h integrable, then g is 
integrable. 


Proor. (a) $i- he t $ra Ay, and the result follows from 1.6.2 and 1.6.3. 
(b) Since |h| = ht +h, this follows from the definition of the integral 
and 1.6.3. 
(c) By 1.5.9(b), |g| is integrable, and the result follows from (b) 
above. LJ. 


A condition is said to hold almost everywhere with respect to the measure 
u (written a.e. [u] or simply a.e. if jz is understood) iff there is a set B € F 
of u-measure 0 such that the condition holds outside of B. From the point of 
view of integration theory, functions that differ only on a set of measure 0 
may be identified. This is established by the following result. 


1.6.5 Theorem. Let f, g, and h be Borel measurable functions. 


(a) If f=Oae. [u], then fẹ fdu =0. 
(b) If g=ha.e. [u] and fẹ gdu exists, then so does fa hd, and fa gdu 


= fohdu. 


PROOF. 

(a) If f = $; xa, is simple, then x; 4 0 implies u(A;) = 0 by hypoth- 
esis, hence fo f du =0. If f > 0 and 0O<s< f, s simple, then s =0 ae. 
[u], hence fa sdu = 0; thus fe f du = 0. If f = ft — f`, then f* and f7, 
being less than or equal to |f|, are O a.e. [u], and the result follows. 

(b) Let A={w: g(@) =h(w)}, B=A‘. Then g = gla + glg, h=hl, 
+ hip = gla + hlg. Since glg = hlg = 0 except on B, a Set of measure 0, 
the result follows from part (a) and 1.6.3. LJ 


Thus in any integration theorem, we may freely use the phrase “almost 
everywhere.” For example, if {ha} is an increasing sequence of nonnegative 
Borel measurable functions converging a.e. to the Borel measurable function 
h, then fo hn du > Johdu. 

Another example: If g and h are Borel measurable and g > h a.e., then 
Jogdu > fe hd [in the sense of 1.5.9(b)]. 
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1.6.6 Theorem. Let h be Borel measurable. 


(a) If h is integrable, then h is finite a.e. 
(b) Ifh>Oand [, hdu = 0, then h = 0 ae. 


Proof. (a) Let A= {w: |h(w)| =œ}. If u(A)>0, then fẹ |h|du > 
f, |h| du = cop(A) = ©, a contradiction. 

(b) Let B={w: hlæ) > 0}, B, ={@: h(w)>1/n}t B. We have 
0 < hlg, < hlg =h, hence by 1.5.9(b), Je, hdu=0. But Je, hdu 
> (1/n)u(B,,), so that ~(B,) = 0 for all n, and thus u(B) = 0. O 


The monotone convergence theorem was proved under the hypothesis that 
all functions were nonnegative. This assumption can be relaxed considerably, 
as we now prove. 


1.6.7 Extended Monotone Convergence Theorem. Let 81, g2,..., g,h be 
Borel measurable. 


(a) If g, > h for all n, where fẹ hdu > —oo, and g, + g, then 


| endut | edu. 
02 (2 


(b) If g, <h for all n, where fẹ hdu < oo, and g, | g, then 


f edus | gdu 
02 (2 


Proor. (a) If fẹhdu = œ, then by 1.5.9(b), fo gn du = œ for all n, and 
fo gdu = œ. Thus assume fa hdu < œ, so that by 1.6.6(a), h is a.e. finite; 
change h to 0 on the set where it is infinite. Then 0 <2, —ht g—hae.,, 
hence by 1.6.2, f. (gn —h) du f fa (g — h) du. The result follows from 1.6.3. 
(We must check that the additivity theorem actually applies. Since f, hdu 
> —00, fo g,du and {, gdu exist and are greater than —co by 1.5,9(b). 
Also, fẹ hd is finite, so that fo gndu— fa hdu and fa gdu — fa hdu are 
well-defined.) 

(b) —gn = —h, J. —hdu > —oo, and —g, + —g. By part (a), — fo 8n du 
t= Jag du, so fo 8ndud fogdu. O 


The extended monotone convergence theorem asserts that under appropriate 
conditions, the limit of the integrals of a sequence of functions is the integral 
of the limit function. More general theorems of this type can be obtained if 
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we replace limits by upper or lower limits. If fı, fo,... are functions from Q 
to R, liminf,_... fn and limsup, _.., fn are defined pointwise, that is, 


(lim inf fn) (wm) = sup inf f;,(@), 
A> OO n k>n 


(in sup fa) (w) = inf sup f (@). 
n k>n 


Fi —> OO 


1.6.8 Fatou’s Lemma. Let fi, f2,..., f be Borel measurable. 
(a) If fa > f for all n, where fa f du > —oo, then 


H — OO 


liminf | f, du > f (lim inf fn) du. 
O O H — OO 


(b) If fa < f forall n, where fẹ f du < œœ, then 


limsup / tnd <j (tim sup fa) du. 
tae, ®) ¢? 0? IL >> 00 
Proor. (a) Let g, = infk>n fy, g = liminf fa. Then g, > f for all n, 


Ja f du > —o, and g, t g. By 1.6.7, Jo g,dut f{, (iminf,oo fr) du. But 
Bn < fn, SO 


Fi OO 


lim f g, du = lim int | gn du < lim int | fandu. 
Q R> JR n> Jo 


(b) We may write 


f (tim sup f) du = -j lim inf (— fa) du 
O n—> 00 Q hoo 
> —liminf | (—f.)du by @ 
H —> 00 O 


= limsup / Indu. O 
Q 


H —> DO 


The following result is one of the “bread and butter” theorems of analysis; 
it will be used quite often in later chapters. 


1.6.9 Dominated Convergence Theorem. If fi, f2,..., f, g are Borel mea- 
surable, | f,,| < g for all n, where g is u-integrable, and fa —> f a.e. [u], then 
f is p-integrable and fo fn du > fo f du. 
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Proof. We have |f| < g a.e.; hence f is integrable by 1.6.4(c). By 1.6.8, 
f (lim inf fa) du < liminf f f„ du < limsup f fa du 
O H — oo HOO O H — 00 O 


<j (tim sup f ) du. 
Q2 ROO 


By hypothesis, lim inf, _... Jn = limsup, _,., Jn = f a.e., so all terms of the 
above inequality are equal to fẹ f du. O 


1.6.10 Corollary. If fi, fo,..., f, g are Borel measurable, | f,,| < g for all 
n, where |g|? is j4-integrable (p > 0, fixed), and fa —> f ae. [u], then |f|? 
is u-integrable and fe |fn — f|? du —> 0O as n > œ. 

Proof. We have |f,|? < |g|? for all n; so |f|? <|g|?, and therefore | f|? 
is integrable. Also |f, — fl? < (fal +I DP < (lgl|)?, which is integrable, 
and the result follows from 1.6.9. LI 


We have seen in 1.5.9(b) that g < h implies f, gdu < fo hdu, and in fact 
f gdu < f, hdu for all A € F. There is a converse to this result. 


1.6.11 Theorem. If is o-finite on ¥, g and h are Borel measurable, |. gdu 
and fẹhdu exist, and f, gdu < f, hd for all A € F, then g < h a.e. [y]. 


Proof. It is sufficient to prove this when u is finite. Let 
l 
An = fo 8(w) > h(w)+ —, hlo) < nt 
n 


Then 


] 
f nquz | edu> | hdu + —u(An). 
An An An n 


| hdu 
An 


and thus we \oay subtract Ja, hdu to obtain (1/n)u(A,) < 0, hence u(An) 
= 0. Therefore u(i, An) = 0; hence u{æ: g(w) > h(@), hæ) finite} = 0. 
Consequently g < h a.e. on {w: h(æ) finite}. Clearly, g < h everywhere on 


But 


< f |h| du < np(An) < 00, 
Ap 
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fæ: hlæ) = co}, and by taking C, = {w: h(w) = —%, g(w) > —n} we obtain 


-co(Cn) = | hdu> | gdu > -nu (Cn); 


hence u(C„) = 0. Thus wp, Ca) = 0, so that 
U{@: (w) > kæ), hlæ) = —co} = 0. 
Therefore g < h a.e. on {@: hlæ) = —oo}. U 


If g and h are integrable, the proof is simpler. Let B = {w: g(w) > h(w)}. 
Then f, gdu < J, hdu < J, g du; hence all three integrals are equal. Thus by 
1.6.3, 0 = f,(g —h)du = Jo(g — h)Ig du, with (g — h)Iz > 0. By 1.6.6(b), 
(e —h)lp = 0 a.e., so that g = h ae. on B. But g < h on B°, and the result 
follows. Note that in this case, u need not be o-finite. 

The reader may have noticed that several integration theorems in this section 
were proved by starting with nonnegative simple functions and working up 
to nonnegative measurable functions and finally to arbitrary measurable func- 
tions. This technique is quite basic and will often be useful. A good illustra- 
tion of the method is the following result, which introduces the notion of a 
measure-preserving transformation, a key concept in ergodic theory. In fact it 
is convenient here to start with indicators before proceeding to nonnegative 
simple functions. 


1.6.12 Theorem. LetT: (Q, F ) > (Qo,-¥o) be a measurable mapping, and 
let u be a measure on .¥. Define a measure uo = uT! on A by 


po(A) = u(T'(A)), AE A. 


If Qo = Q, Fo =F, and ho = H, T is said to preserve the measure m. 
If f: (Q0, 0) > (R, 4(R)) and A € .FYp, then 


| f(T (@)) dulo) = f f (©) duolo), 
T-'A A 


in the sense that if one of the integrals exists, so does the other, and the two 
integrals are equal. 


Proof. If f is an indicator Ig, the desired formula states that 


u(T 'ANT 'B) = uo(ANB), 
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which is true by definition of fo. If f is a nonnegative simple function 
S` 41 p,, then 


= Ip (T(@)) 4 1.6.3 
J F(T (@)) du(@) d* I. p(T (@))du(@) by 


— Yo J To) duolo) 


by what we have proved for indicators 


— J fFlo)duol@) by 1.6.3. 


If f is a non-negative Borel measurable function, let f1, f2, ... be nonnegative 
simple functions increasing to f. Then f -14 fn(T(@)) du(w)= f, fn(@) duolw) 
by what we have proved for simple functions, and the monotone convergence 
theorem yields the desired result for f. 

Finally, if f = f* — f` is an arbitrary Borel measurable function, we have 
proved that the result holds for f* and f~. If, say, f, f*(@)duo(@) < ~, 
then f., f*(T(@)) du(@) < œ, and it follows that if one of the integrals 
exists, so does the other, and the two integrals are equal. LJ 


If one is having difficulty proving a theorem about measurable functions 
or integration, it is often helpful to start with indicators and work upward. In 
fact it is possible to suspect that almost anything can be proved this way, but 
of course there are exceptions. For example, you will run into trouble trying 
to prove the proposition “All functions are indicators.” 

We shall adopt the following terminology: If 4 is Lebesgue measure and A 
is an interval [a, b], |, f du, if it exists, will often be denoted by f ” f(x)dx 
(or fo TE fr f(x}, +++, Xn)dx,-++dx, if we are integrating functions on R”). 
The endpoints may be deleted from the interval without changing the integral, 
Since the Lebesgue measure of a single point is 0. If f is integrable with 
respect to u, then we say that f is Lebesgue integrable. A different notation, 
such as raæ( f), will be used for the Riemann integral of f on [a, b]. 


Problems 


j 
The first three problems give conditions under which some of the most 
commonly occurring operations in real analysis may be performed: taking a 
limit under the integral sign, integrating an infinite series term by term, and 
differentiating under the integral sign. 
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Let f = f(x, y) be a real-valued function of two real variables, defined 
fora < y < b,c <x < d. Assume that for each x, f(x, -) is a Borel mea- 
surable function of y, and that there is a Borel measurable g: (a, b) > R 
such that | f(x, y)| < g(y) for all x, y, and f? 2(y)dy < oo. If x € (c,d) 
and lim,-.,, f(x, y) exists for all y € (a, b), show that 


b b 
lim f fle, y)dy = f lim fa, ») dy. 
X>X0 Ja a x —> XG 


Let fi, f2, ... be Borel measurable functions on (Q,.¥, u). If 


OO 
>| |faldu < 00, 
n=] Q 


show that $2] fn converges a.e. [u] to a finite-valued function, and 
Joa Erci fa) du = SR Jo fa du. 


Let f = f(x, y) be a real-valued function of two real variables, defined 
fora < y< b,c <x <d, such that f is a Borel measurable function of 
y for each fixed x. Assume that for each x, f(x, -) is integrable over (a, b) 
(with respect to Lebesgue measure). Suppose that the partial derivative 
f(x, y) of f with respect to x exists for all (x, y), and suppose there is a 
Borel measurable h: (a,b) — R such that | fix, y)| < h(y) for all x, y, 
where f? h(y)dy < oo. 


Show that d{ f? f(x, y)dy]/dx exists for all x € (c,d), and equals 
f? f(x, y)dy. [It must be verified that f,(x, -) is Borel measurable for 
each x.] 


If u is a measure on (Q, Z ) and A;, A2,... is a Sequence of sets in F, 
use Fatou’s lemma to show that 


u (lim inf An ) < liminf (Ap). 
Fi H ——> OO 
If u is finite, show that 


u (tim supA, ) > lim sup u (An). 


n — 00O 


Thus if u is finite and A = lim, An, then u(A) = limy.o u(n). (For 
another proof of this, see Section 1.2, Problem 10.) 


Give an example of a sequence of Lebesgue integrable functions fn 
converging everywhere to a Lebesgue integrable function f, such that 


lim f. fa) dx < Ii f(x)dx. 
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Thus the hypotheses of the dominated convergence theorem and Fatou’s 
lemma cannot be dropped. 


6. (a) Show that f” eln tdt = lim, 00 f; [1 — (¢/n)]" In tdt. 


(b) Show that fi e~'In tdt =limy oo fo [1 — t/n)" In tdt. 

7. If (Q,.¥, w) is the completion of (2, o, 4) and f is a Borel measurable 
function on (&2,.¥ ), show that there is a Borel measurable function g on 
(2, g) such that f = g, except on a subset of a set in .Y% of measure 
0. (Start with indicators.) 

8. If f is a Borel measurable function from R to R and a € R, show that 


J f(x)dx = [ fe-aas 


in the sense that if one integral exists, so does the other, and the two are 
equal. (Start with indicators.) 


1.7 Comparison oF LEBESGUE AND RIEMANN INTEGRALS 

In this section we show that integration with respect to Lebesgue measure 
is more general than Riemann integration, and we obtain a precise criterion 
for Riemann integrability. 

Let [a,b] be a bounded closed interval of reals, and let f be a bounded 
real-valued function on [a,b], assumed fixed throughout the discussion. If 
P: a= x9 <x; <<: <X, =b is a partition of [a, b], we may construct the 
upper and lower sums of f relative to P as follows. 

Let 

M; = sup{ fY): x-1 <y Su},  i=l,...,n, 


mi = inf{ f(y): x1 < y < x;}, i=1,...,n, 


and define step functions œ and 8, called the upper and lower functions 
corresponding to P, by 


a(x)=M; if Xj- <X<X, i=1,...,h, 
B(x) = m; if Xj) < X £X, i= 1,...,n 
[a(a) and (a) may be chosen arbitrarily]. The upper and lower sums are 


given by 


U(P) = X MiGi — Xi-1), 
i= | 


L(P) = X mG — Xj-1). 


i= | 
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Now we take as a measure space Q = [a, b], ¥ = B [a,b], the Lebesgue 
measurable subsets of [a, b], uy = Lebesgue measure. Since œ and $ are simple 
functions, we have 


U(P) = f odu L(P) = [ Bd. 


Now let P;, P2,... be a sequence of partitions of [a, b] such that P4; is 
a refinement of P, for each k, and such that |P;| (the length of the largest 
subinterval of P) approaches 0 as k — oo. If a, and 8, are the upper and 
lower functions corresponding to P;, then 


M202 >---> f 2+: > Po = By. 
Thus œ, and £8, approach limit functions a and £. If |f| is bounded by M, then 
all |a,| and |8,| are bounded by M as well, and the function that is constant 
at M is integrable on [a, b] with respect to u, since 


pla, b]=b-a<ow. 


By the dominated convergence theorem, 


b b 
lim U(P;,) = lim f ax du = | a du, 
k—>0o k> Ja a 


and , , 
lim L(P;) = im | Bau = | B du. 
k—-00 k— 00 a a 


We shall need one other fact, namely that if x is not an endpoint of any of 
the subintervals of the P;,, 


f is continuous at x iff a(x) = f(x) = BO). 


This follows by a standard £—ô argument. 

If limə U(P;) = limy.. L(P;,) = a finite number r, independent of the 
particular sequence of partitions, f 1s said to be Riemann integrable on [a, b], 
and r = r,,(f) is said to be the (value of the) Riemann integral of f on fa, bl]. 
The above argument shows that f is Riemann integrable iff 


b b 
| adu= f Bdu =r, 
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independent of the particular sequence of partitions. If f is Riemann inte- 


grable, 
b b 
ro(f)= | adu= | pdu 


We are now ready for the main results. 


1.7.1 Theorem. Let f be a bounded real-valued function on [a, b]. 


(a) The function f is Riemann integrable on [a, b] iff f is continuous 
almost everywhere on [a, b] (with respect to Lebesgue measure). 

(b) If f is Riemann integrable on [a, b], then f is integrable with respect 
to Lebesgue measure on [a, b], and the two integrals are equal. 


Proof. (a) If f is Riemann integrable, 


ra (f) = f adu = f pau 


As < f <a, 1.6.6(b) applied to a — p yields a = f = $ a.e.; hence f is 
continuous a.e. Conversely, assume f is continuous a.e., then a = f = B ae. 
Now g and £ are limits of simple functions, and hence are Borel measurable. 
Thus f differs from a measurable function on a subset of a set of measure 
0, and therefore f is measurable because of the completeness of the measure 
space. (See Section 1.5, Problem 4.) Since f is bounded, it is integrable with 
respect to u, and since a = f = B a.e., we have 


[ada [bau f fdu (1) 


independent of the particular sequence of partitions. Therefore f is Riemann 
integrable. 

(b) If f is Riemann integrable, then f 1s continuous a.e. by part (a). But 
then Eq. (1) yields ra (f) = f? f du, as desired. L] 


Theorem 1.7.1 holds equally well in n dimensions, with [a, b] replaced by 
a closed bounded interval of R”; the proof is essentially the same. 

A somewhat more complicated situation arıses with improper integrals; here 
the interval of integration is infinite or the function f is unbounded. Some 
results are given in Problem 3. 

We have seén that convenient conditions exist that allow the interchange 
of limit operations on Lebesgue integrable functions. (For example, see Prob- 
lems 1—3 of Section 1.6.) The corresponding results for Riemann integrable 
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functions are more complicated, basically because the limit of a sequence of 
Riemann integrable functions need not be Riemann integrable, even if the 
entire sequence is uniformly bounded (see Problem 4). Thus Riemann inte- 
grability of the limit function must be added as a hypothesis, and this is a 
serious limitation on the scope of the results. 


Problems 


1. The function defined on [0, 1] by f(x) = 1 if x is irrational, and f(x) = 0 
if x is rational, is the standard example of a function that is Lebesgue 
integrable (it is 1 a.e.) but not Riemann integrable. But what is wrong 
with the following reasoning? 

If we consider the behavior of f on the irrationals, f assumes the 
constant value 1 and is therefore continuous. Since the rationals have 
Lebesgue measure 0, f is therefore continuous almost everywhere and 
hence is Riemann integrable. 


2. Let f be a bounded real-valued function on the bounded closed interval 
[a, b]. Let F be an increasing right-continuous function on [a, b] with cor- 
responding Lebesgue—Stieltjes measure u (defined on the Borel subsets 
of [a, b]). 

Define M;, m;, a, and 6 as in 1.7, and take 


n b 
UP) = OMFG) — Foi) = | adu 


i=]| 


n b 
LP) = So mF- Foi) f pdu 


i=] 


where f? indicates that the integration is over (a, b]. If {P}} 1s a sequence 
of partitions with |P;| —> O and P4; refining P}, with a; and 8; the upper 
and lower functions corresponding to Px, 


b 
k— co a 


b 
im LP) = | Baw, 
k—> 00 a 
where œ = limy-.o0 ay, B = liM; co By. If U(P}) and L(P;,) approach the 
same limit r,,(/; F) G(ndependent of the particular sequence of partitions), 
this number is called the Riemann—Stieltjes integral of f with respect to 


F on [a, b], and f is said to be Riemann-—Stieltjes integrable with respect 
to F on fa, b]. 
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(a) Show that f is Riemann-—Stieltjes integrable iff f is continuous a.e. 
[u] on [a, b]. 

(b) Show that if f is Riemann—Stieltjes integrable, then f is integrable 
with respect to the completion of the measure u, and the two integrals 
are equal. 


3. Iff: R — R, the improper Riemann integral of f may be defined as 


r(fJ)= lim raf) 


a— -0o 
boo 


if the limit exists and is finite. 


(a) Show that if f has an improper Riemann integral, it is continuous 
a.e. [Lebesgue measure] on R, but not conversely. 


(b) If f is nonnegative and has an improper Riemann integral, show that 
f is integrable with respect to the completion of Lebesgue measure, 
and the two integrals are equal. Give a counterexample to this result 
if the nonnegativity hypothesis is dropped. 


4. Give an example of a sequence of functions f, on [a, b] such that each 
fa is Riemann integrable, | f,| < 1 for all n, fa — f everywhere, but f 
is not Riemann integrable. 


Note: References on measure and integration will be given at the end of 
Chapter 2. 


CHAPTER 


2 


FURTHER RESULTS IN MEASURE AND 
INTEGRATION THEORY 


2.1 INTRODUCTION 

This chapter consists of a variety of applications of the basic integra- 
tion theory developed in Chapter 1. Perhaps the most important result is the 
Radon—Nikodym theorem, which is fundamental in modern probability the- 
ory and other parts of analysis. It will be instructive to consider a special 
case of this result before proceeding to the general theory. Suppose that F 
is a distribution function on R, and assume that F has a jump of magnitude 
ag at the point x}, k = 1,2,.... Let us subtract out the discontinuities of F; 
specifically let u; be a measure concentrated on {x,, x2,...}, with ui {xz} = a 
for all k, and let F, be a distribution function corresponding to u. Then 
G = F — F, is a continuous distribution function, so that the correspond- 
ing Lebesgue—Stieltjes measure A satisfies A{x} = 0 for all x. Now in any 
“practical” case, we can write G(x) = i S(t) dt, x € R, for some nonnega- 
tive Borel measurable function f (the way to find f is to differentiate G). It 
follows that A(B) = fi f(x) dx for all B e .# (R). To see this, observe that if 
à’ (B) = J, f(x) dx, then i’ is a measure on .# (R) and à’ (a, b] = G(b) — G(a); 
thus 4’ is the Lebesgue-—Stieltjes measure determined by G; in other words, 
=i. 

It is natural to conjecture that if A is a measure on .# (R) and A{x} = 0 for 
all x, then we can write à (B) = J, f(x) dx, B € .@(R), for some nonnegative 
Borel measurable f. However, as found by Lebesgue, the conjecture is false 
unless the hypothesis is strengthened. Not only must we assume that à assigns 
measure 0 to singletons, but in fact we must assume that à is absolutely 
continuous with respect to Lebesgue measure u, that is, if (B) = 0, then 
X(B) = 0. In general, A may be represented as the sum of two measures A, 
and àz, where A, is absolutely continuous with respect to u and àz is singular 
with respect to u, which means that àz is concentrated on a set of Lebesgue 
measure 0. A simple example of a measure singular with respect to u is 
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one that is concentrated on a countable set; however, as we shall see, more 
complicated examples exist. 

The first step in the development of the general Radon—Nikodym theorem 
is the Jordan—Hahn decomposition, which represents a countably additive set 
function as the difference of two measures. 

Let (Q, 7, u) be a measure space, h a Borel measurable function such 
that fẹ hd exists. If A(A) = f, hdu, A €.¥, then by 1.6.1, A is a countably 
additive set function on ¥. We call A the indefinite integral of h (with respect 
to u). If u is Lebesgue measure and A = [a, x], then A(A) = f? h(y) dy, the 
familiar indefinite integral of calculus. As we noted after the proof of 1.6.1, 
A is the difference of two measures, at least one of which is finite. We are 
going to show that any countably additive set function can be represented in 
this way. First, a preliminary result. 


2.1.1 Theorem. Let A be a countably additive extended real-valued set func- 
tion on the o-field Z of subsets of Q. Then A assumes a maximum and a 
minimum value, that is, there are sets, C, D € .Y such that 


A(C) = sup{A(A): AEF} and A(D) = inf{rA(A): AE F}, 


Before giving the proof, let us look at some special cases. If à is a measure, 
the result is trivial: take C = Q, D = Ø. If A is the indefinite integral of h with 
respect to 44, we may write 


MA) = f hdu= | hdu+ | hdu. 
A AN ws h(w)>0) AN{w: h(w)<0} 


J hdu < MA) = | hdu. 
{w h(w)<0} {w: h(w)>0} 


Therefore we may take C = {w: h(w) > 0}, D = {w: h(w) < 0}. 


PROOF. First consider the sup. We may assume that à < oo, for if (Ap) = co 
we take C = Ap. Let A, € ¥ with A(A,) —> sup À, and let A = (JZ, An € F. 

For each n, we may partition A into 2” disjoint subsets A„m, where each 
Anm 1S of the form A;* MA 2*M---MA,*, with A;* either A; or A — Aj. For 


example, if n = 3, we have (with intersections written as products) 
A = A;A2A3 UA} A2A4 U A\A5A3 U A}Ay’A3’ U A\A2A3 
U Á An A3’ U A;'Ay’A3 U A;'Ay‘A3’, where A; = Á— Aj. 


Let B, = U,,{Anm: 4(Anm) > 0}; set B, = Ø if A(Anm) < O for all m. Now 
each A, is a finite union of some of the Apm, hence A(A,,) < A(B,) by 
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definition of B,. Also, if n’ > n, each A,’ is either a subset of a given 
Anm or disjoint from it (for example, A;A2’A3’ C A;A2", and A|A2’A3’ is dis- 
joint from A,A2, A;’A2, and A;’A2’). Thus |); Bg can be expressed as a 
union of B, and sets E disjoint from B, such that A(E) > 0. [Note that, 
for example, if 4(A;A2’) = A(A1A2’A3) + A(A1A2’A3’) > 0, it may happen that 
X(A,A2’A3) < 0, A(A1A2’A3’) > 0, so the sequence {B,,} need not be mono- 
tone.] Consequently, 


r OO 
MAn) < (Br) <A; LJ Be) ATL By as roo by 1.2.7€a). 
k=n k=n 


Let C = lim, supB, = N2 UL, Br. Now UZ, B} C, and 0< 
AUC, Br) < œ for all n. By 1.2.7(b), (UZ, Bk) > A(C) as n > oo. Thus 


OO 
supA = lim A(A,,) < im 2(U | = (C) < sup à; 
k=n 


hence à (C) = sup à. The above argument applied to —à yields D € ¥ with 
A(D) = inf à. O 


We now prove the main theorem of this section. 


2.1.2 Jordan—Hahn Decomposition Theorem. Let à be a countably addi- 
tive extended real-valued set function on the o-field 7. Define 


àA (A) = sup{A(B): Be F,B CA}, 
17 (A) = — inf(A (B): B € F.B C A}. 


Then A* and A~ are measures on Z and à = àT — À”. 


Proof. We may assume A never takes on the value —oo. For 1f —co be- 
longs to the range of A, +00 does not, by definition of a countably additive 
set function. Thus —A never takes on the value —oo. But (—A)* = à~ and 
(—A)~ = X*, so that if the theorem is proved for —) it holds for à as well. 

Let D be a set on which A attains its minimum, as in 2.1.1. Since à (Ø) = O, 
we have —co < A(D) < 0. We claim that 


(AND) < 0, A(A ND‘) > 0 for all AEF. (1) 


For if A(AN D) > 0, then A(D) =AAND)+A(AS ND). Since à(D) is 
finite, so are A(A MD) and à (A€ ND); hence à (AF N D) = ACD) — X(A ND) 
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< A(D), contradicting the fact that A(D) =infA. If A(AN D°) < 0, then 
A(DU (AN D) =A(D) + A(AN D°) < X(D), a contradiction. 
We now show that 


AT (A) =ACAND), XK” (A) = —ÀA (AN D). (2) 
The theorem will follow from this. We have, for B € .¥, BC A, 


A(B) = (BND) + A(B N D°) 
<A(BNAD)  by(l) 
< A(BN D°) +A((4 — B) N D°) 
= MAND‘). 


Thus A* (A) < A(AN D°). But A(A N D°) < AT(A) by definition of à”, proving 
the first assertion. Similarly, 


M(B) = (BND) +A(B N D°) 
> A(BND) 
> A(BN D) +A(A —B)ND) 
= (AND). 


Hence —A7 (A) > A(AND). But A(A ND) > —d (A) by definition of A-, 
completing the proof. O 


2.1.3 Corollaries. Let à be a countably additive extended real-valued set 
function on the o-field F: 

(a) The set function à is the difference of two measures, at least one of 
which is finite. E 

(b) If A is finite (À (A) is never +00 for any A € .¥ ), then A is bounded. 

(c) There is a set D e ¥ such that à (A N D) < 0 and à (A N D°) > O for 
all AE F. 

(d) If Dis any set in.¥ such that à (A N D) < 0 and à (A N D*) > 0 for all 
A E€ F, then à7 (A) = A(AN D°) and à~ (A) = —A(A ND) for all A € F. 

(e) If EF is another set in .¥ such that à (A N E) < 0 and A(AN E°) > 0 for 
all A e F; then |A|(D AE) = 0, where |à] = AT +217. 


Proor. (a) If A> —oo, then in 2.1.2, à`! is finite: if à < +00, A* is finite 
[see Eq. (2)]. 

(b) In 2.1.2, àt and à` are both finite; hence for any A €.F, |A(A)| 
< 27(Q) +47 (Q) < œ. 
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(c) This follows from (1) of 2.1.2. 

(d) Repeat the part of the proof of 2.1.2 after Eq. (2). 

(e) By(d), AT (A) = à (A N D°), A e ¥;takeA = D N E" to obtain 
4*(DNM ES) =0. Also by (d), AT(A) =A (AN E°), A € Z take A = D'NE 
to obtain 4*(D°N E) =0. Therefore (D A E) = 0. The same argument 
using à` (A) = —A(AN D) = —A(A N E) shows that à~ (D A E) = 0. The re- 
sult follows. L 


Corollary 2.1.3(d) is often useful in finding the Jordan -Hahn decomposition 
of a particular set function (see Problems 1 and 2). 


2.1.4 Terminology. We call X* the upper variation or positive part of à, X~ 
the lower variation or negative part, |A| = A~47 + à7 the total variation. Since 
à = à% —d-, it follows that |A(A)| < |A|(A), A € F. For a sharper result, see 
Problem 4. 

Note that if A €.¥, then |A|(A) = 0 iff A(B) = 0 for all B € ¥, BCA. 

The phrase signed measure is sometimes used for the difference of two mea- 
sures. By 2.1.3(a), this is synonymous (on a o-field) with countably additive 
set function. 


Problems 


l. Let P be an arbitrary probability measure on .# (R), and let Ọ be point 
mass at 0, that is, O(B) = 1 if O € B, O(B) = 0 if O ¢ B. Find the Jor- 
dan—Hahn decomposition of the signed measure à = P — Q. 


2. Leta(A)= |, f du, A in the o-field F, where fe f du exists; thus à is a 
signed measure on .¥. Show that 


atA) = f ftdu, VA) = J fdu, IAA) = / Pld. 


3. Ifa signed measure A on the o-field ¥ is the difference of two measures 
à; and àz, show that A, > AT, A. > Aq. 


4. Let à be a signed measure on the o-field ¥. Show that |A|(A) = 
sup{>>._, ACE) |: Ei, E2,..., En disjoint measurable subsets of A, 
n=1,2,...}. Consequently, if à; and àz are signed measures on .¥, 
then |à; + Az] < [àil + là2l. 


2.2 Rapon—NikopyM THEOREM AND RELATED RESULTS 

If (Q, 7, u) is a measure space, then à (A) = f, gdu, A € F, defines a 
signed measure if [a gdu exists. Furthermore, if A € ¥ and u(A) = 0, then 
à (A) = 0. For gla = 0 on A‘, so that gJ, = Q a.e. [u], and the result follows 
from 1.6.5(a). 
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If u is a measure on the o-field .¥, and à is a signed measure on ¥, we 
say that A is absolutely continuous with respect to u (notation A < u) iff 
u(A) = 0 implies A(A) = O(A € F). Thus if A is an indefinite integral with 
respect to u, then A < u. The Radon—Nikodym theorem is an assertion in the 
converse direction; if à < u (and u is o-finite on .¥ ), then A is an indefinite 
integral with respect to u. As we shall see, large areas of analysis are based 
on this theorem. 


2.2.1 Radon-—Nikodym Theorem. Let u be a o-finite measure and A a signed 
measure on the o-field Y of subsets of &2. Assume that A is absolutely contin- 
uous with respect to u. Then there is a Borel measurable function g: Q > R 
such that 


A(A) = | sau for all AEF. 
A 


If h is another such function, then g = A a.e. [u]. 


Proor. The uniqueness statement follows from 1.6.11. We break the exis- 
tence proof into several parts. 


(a) Assume A and u are finite measures. 

Let .~ be the set of all nonnegative j-integrable functions f such that 
J, f du < (A) forall A € F. Partially order.” by calling f < giff f < g a.e. 
[u]. Let s = sup{ fo f du: f € 7%} < A(Q) < œœ. (“is a nonempty collection 
since it contains the zero function.) We are going to produce a maximal element 
of ~, and we first note that if pw g belong to ~ and h = max(f, g) then 
he Z. For if B is the set on which f > g and C the set on which f < g, 


then 
frdu=] fdu+ | gdu 
A ANB ANC 


< MANB)+X(ANC) = MA). 


Now let fi, f2,... be a sequence in .“ such that fẹ fa du —> s, and let g, 
= max (fi, .--, fn). Then gn € Z, g, increases to a limit g, and fe gdu = s 
(use the monotone convergence theorem and the fact that g, > fa). We claim 
that g € Z; since fe gdu = s, it will follow that g is a maximal element of 
Z. To show that g belongs to Z let A be any set in ¥. Then 


0 < gnala t gla, SO f erdu= | entadut f| stadu = f edu 
A Q 02 A 


But J, 8nd < (A) for all n, and therefore f, gdu < A(A). 
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Now that we have our maximal element g, let A,;(A) = A(A) — J, gdu, 
g © F. Then A, is a measure, A; < u, and A; (Q) < oo. If à; is not identically 
0, then à: (2) > 0, hence 


uR) — kà (Q) < 0 for some k > 0. (1) 


Apply 2.1.3(c) to the signed measure u — kà; to obtain D € ¥ such that for 
all A EF, 
u(AND)-— kà (AND) < 0, (2) 


and 
u(A NM DS) — ki (AN D*) > 0. (3) 


We claim that u (D) > 0. For if u(D) = 0, then 4(D) = 0 by absolute con- 
tinuity, and therefore 4;(D) = 0 by definition of A,. Take A = Q in (3) to 
obtain 


0 < (D°) — kà (D°) 
= WQ) — kà (Q2) since  u(D) =A (D)=0 
<0 by (1), 


a contradiction. Define h(w) = 1/k, œ € D; h(w) = 0, wm ¢ D. If A € F, 
1 
| ndu= {wand Suan y © 
A 
< A, (A) =A(A) - | eau. 
A 


Thus [,(4+ g)du < A(A). But h+ g > g on the set D, with u(D) > 0, con- 
tradicting the maximality of g. Thus A; = 0, and the result follows. 

(b) Assume u is a finite measure, 4 ao-finite measure. 

Let Q be the disjoint union of sets A, with A(A,,) < co, and let A, (A) 
= MMANA,) AEF, n=1,2,.... By part (a) we find a nonnegative Borel 
measurable g, with i,,(A) = J, &n du, A € F. Thus A(A) = |, gdu, where 
8 = din Bn. 

(c) Assume yp is a finite measure, à an arbitrary measure. 

Let & be the class of sets C € ¥ such that à on C (that is, A restricted to 
Fe = {ANC: A €.F¥}) is o-finite; note that Ø € 6, so & is not empty. Let 
s = sup{u (A): A € #} and pick C, € ¥ with w(C,) > s. If C= UP, Cy, 
then C € & by definition of 7, and s > u(C) > w(C,) > s; hence u(C) =s. 
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By part (b), there is a nonnegative g’: C —> R, measurable relative to 7, and 
A(R), such that 


MANC)= | edu forall Aes. 
ANC 


Now consider an arbitrary set A € ¥. 


Case l: Let u(ANC*) > 0. Then AANC*) = œ, for if A(AN CS) < œ, 
then C U (ANC) e Z; hence 


s > (CU (A U C®)) = uC) + MANC') > uC) =s, 


a contradiction. 


Case 2: Let w(ANC*) = 0. Then A(A N C°) = 0 by absolute continuity. 
Thus in either case, A (A N C°) = finc: co du. It follows that 


AC) = MANC)+AANC) = | edu, 
A 


where-g = g on C, g=co on C. 

(d) Assume yz is a o-finite measure, A an arbitrary measure. 

Let Q be the union of disjoint sets A, with u(A„) < co. By part (c), 
there is a nonnegative function g,: A, — R, measurable with respect to s4, 
and (R), such that A(A N An) = fang En du, A E.F. We may write this 
as A(ANA,) = f, 8nd where g,(w) is taken as 0 for w ¢ A,. Thus A(A) 
= 35, MANA) = Sy, Jy Bn du = fy gdu, where g = E, 8n. 

(e) Assume u is a O-finite measure, à an arbitrary signed measure. 


Write à = àt — à- where, say, A~ is finite. By part (d), there are non- 
negative Borel measurable functions g; and g3 such that 


(A) = f gidu A= f ga du AEF 
A 


Since A” is finite, g) is integrable; hence by 1.6.3 and 1.6.6(a), A(A) 
= J,(g1 — g2)du. O 


2.2.2 Corollaries. Under the hypothesis of 2.2.1, 


(a) If A is finite, then g is z-integrable, hence finite a.e. [u]. 

(b) If |A| is o-finite, so that Q can be expressed as a countable union of sets 
An such that [À | (A, ) is finite (equivalently A (A, ) is finite), then g is finite a.e. [u]. 

(c) If Ais a measure, then g > 0 ae. [u]. 
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Proor. All results may be obtained by examining the proof of 2.2.1. Alter- 
natively, we may proceed as follows: 


(a) Observe that A(Q) = fo gdu, finite by hypothesis. 

(b) By (a), g is finite a.e. [u] on each A,,, hence finite ae. [u] on &2. 

(c) Let A= {w: g(w) <0}; then 0 < à (A) = f, gdu <0. Thus —gl, isa 
nonnegative function whose integral is 0, so that g/, = 0 ae. [u] by 1.6.6(b). 
Since gf, < 0 on A, we must have w(A) = 0. U 


If A(A) = f, gdu for each A € .F, g is called the Radon-Nikodym derivative 
or density of A with respect to u, written dA/dw. If u is Lebesgue measure, 
then g is often called simply the density of À. 

There are converse assertions to 2.2,.2(a) and (c). Suppose that 


MA) = | gdu, AEF, 
A 


where fẹ gdu is assumed to exist. If g is w-integrable, then à is finite; if 
g > O0a.e. [u], then A > 0, so that A is a measure. (Note that o-finiteness of u 
is not assumed.) However, the converse to 2.2.2(b) is false; if g is finite a.e. 
[u], |A] need not be o-finite (see Problem 1). 

We now consider a property that is in a sense opposite to absolute continuity. 


2.2.3 Definitions. Let u; and uz be measures on the o-field Z. We say that 
ui is singular with respect to uz (written ww; L u2) iff there is a set A E€ Z 
such that u; (4) = 0 and w2 (A€) = 0; note jz; is singular with respect to u iff 
{42 is singular with respect to u1, So we may say that jz; and uz are mutually 
singular. If 4; and Az are signed measures on .¥, we say that A, and Az are 
mutually singular iff |A,| L |A2|. 

If py L po, with u (A) = (A) = 0, then uz only assigns positive mea- 
sure to subsets of A. Thus uz concentrates its total effect on a set of 44;-measure 
0; on the other hand, if u2 < u1, u2 can have no effect on sets of um,- 
measure 0. 

If A is a signed measure with positive part X* and negative part A, we 
have AT L àT by 2.1.3(c) and (d). 

Before establishing some facts about absolute continuity and singularity, 
we need the following lemma. Although the proof is quite simple, the result 
is applied very often in analysis, especially in probability theory. 


2.2.4 Borel-Cantelli Lemma. If A,,A2,...€-% and }°*, w(An) < œ, 
then u(lim sup, Án) = 0. 
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Proof. Recall that limsup, An = NPL: UZ, Ax; hence 


OO 
wim supA,,) < u ) Ax for all n 
" k=n 


OO 
< X uA) > 0 as n > œ. U 
k=n 


2.2.5 Lemma. Let u be a measure, and à; and àz signed measures, on the 
o-field .¥. 

(a) IfA, L u and à L u, then A; + Az LH. 

(b) If A, < u, then |à] « u, and conversely. 

(c) IfA, « u and Az L u, then A, L dz. 

(d) A, < u and A, L u, then A; = 0. 

(e) /If A, is finite, then A; < u iff lim (a) +0 A, (A) = 0. 


Proof. (a) Let u(A) = w(B) = 0, |A;|(AC) = |A2|(B°) = 0. Then w(A U B) 
=Qand A; (C) = à2 (C) = Oforevery C €e F with C C A‘ N B°; hence 
JA; + Ag|[(A U B) ] = 0. 

(b) Let u(A) = 0. If AT (A) > 0, then (see 2.1.2) A, (B) > O0 for some 
B CA; since u(B) = 0, this is a contradiction. It follows that 7, and simi- 
larly A; , are absolutely continuous with respect to u; hence |A,| « u. (This 
may also be proved using Section 2.1, Problem 4.) The converse is clear. 

(c) Let u(A) = 0, |A2| (AS) = 0. By (b), [Ai1](A) = 9, so [ài] L Ag. 

(d) By (c), A; LA; hence for some A € .¥, |A,|(A) = |A,|(AC) = 0. Thus 
[A ;| (<2) = 0. 

(e) If w(A,,) > 0 implies 4;(A,) —> 0, and w(A) = 0, set A, =A to con- 
clude that à; (A) = 0, soa; < u. 

Conversely, let A, << u. 

If lim ajo |Ai| (A) Æ O we can find, for some € > 0, sets A, € F 
with w(A,) < 27” and |A,|(A,) > e£ for all n. Let A = lim, supA,; by 2.2.4, 
u (A) = 0. But |A,| (Uren Ax) > |A,| (An) > e forall n; hence by 1.2.7(b), 
|A,|(A) = €, contradicting (b). Thus lim, (4). |Ai|(A) = 0, and the result 
follows since |à; (A)| < JA,|(A). O 


If à; is an indefinite integral with respect to u (hence A; < u), then 2.2.5(e) 
has an easier proof. If à; (A) = f, f du, A € F, then 


[ifiau= | fidu+ f fidu <nwiy+ | |f | du. 
A AN fi<n) AN If i>”) {\f|>n} 
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By 1.6.1 and 1.2.7(b), Ja pien |f| du may be made less than ¢/2 for large n, 
say n > N. Fix n = WN and take (A) < €/2N, so that f, |fldu <e. 

If u is a measure and à a signed measure on the o-field 7, à may be 
neither absolutely continuous nor singular with respect to u. However, if [À| 
is o-finite, the two concepts of absolute continuity and singularity are adequate 
to describe the relation between à and u, in the sense that à can be written 
as the sum of two signed measures, one absolutely continuous and the other 
singular with respect to p. 


2.2.6 Lebesgue Decomposition Theorem. Let u be a measure on the o-field 
F, à a o-finite signed measure (that is, |A| is o-finite). Then à has a unique 
decomposition as A; +42, where A; and Az are signed measures such that 
Ay Kp, à luu. 


Proor.' First assume that À is a finite measure, and let Z = {A € F: (A) 
= 0} and s = sup{A(A): A e Z} < A(Q) < œ. If Aj, Ao, ... is a sequence of 
sets in % such that A(A,) > s, then A* = [JZ A, € Z and A(A*) = s. We 
claim that 4(B — A*) = 0 for every B e &. For if B € & and A(B — A*) > 0, 
then 

à (A* U B) = A(A*) + A(B — A*) > s, 


a contradiction. Now define 
A1(A) = (A — A*), àz (A) = MA 1A”), AEF 


If u(B) = 0 then Be &, so A,(B) = A(B — A*) = 0, hence A; « u. Since 
u(A*) = 0 and 42(A**) = 0, we have A, L u. 

If à is a o-finite measure, let Q be the disjoint union of sets A, such that 
(Ay) < œ. If A, (A) = ACA NA,), A E.F, then by the above argument there 
are finite measures A,; < u and A,. L u such that A, = Àn +A,2. Sum on 
nto get à =A, +A2 with Ay < ph, Ao Lu. 

Now if A is a o-finite signed measure, the above argument applied to A* 
and A~ proves the existence of the desired decomposition. 

To prove uniqueness, first assume A finite. If à = A; + Az = A,’ + Ad’, where 
Ai, Al < LL, Aa, Ay’ L LL, then à} — A,’ = Ay’ — A> is both absolutely contin- 
uous and singular with respect to u; hence it is identically 0 by 2.2.5(d). If A 
is o-finite and {2 is the disjoint union of sets A, with |A|(A,) < co, apply the 
above argument to each A,, and put the results together to obtain uniqueness 
of A, and A>. O 


| J. K. Brooks, American Mathematical Monthly, June-July 1971. 
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Note that as a corollary of the proof, if à is a o-finite measure (as opposed 


to a o-finite signed measure), then A, and Az are measures. If A is a probability 
measure, then A,, A> < 1. 


Problems 


l. 


Give an example of a measure u and a nonnegative finite-valued Borel 
measurable function g such that the measure À defined by à (A) = f, gdu 
is not o-finite. 

If A(A) = f, gdu, A € F, and g is ys-integrable, we know that A is finite; 
in particular, A = {@: g(w) Æ 0} has finite A-measure. Show that A has o- 
finite 44-measyte, that is, it is a countable union of sets of finite u-measure. 
Give an example to show that (A) need not be finite. 


Give an example in which the conclusion of the Radon—Nikodym theorem 
fails; in other words, A < u but there is no Borel measurable g such that 
A(A) = f, gdu for all A € F. Of course u cannot be o-finite. 


(A chain rule) Let (Q, Z, u) be a measure space, and g a nonnegative 
Borel measurable function on Q. Define a measure à on .¥ by 


MA) = f gdu AEF. 
A 


Show that if f is a Borel measurable function on Q, 


| farm | feau 


in the sense that if one of the integrals exists, so does the other, and the 
two integrals are equal. (Intuitively, dA/du = g, so that dA = g du.) 
Show that Theorem 2.2.5(e) fails if A, is not finite. 


(Complex measures) If (Q, .¥) is a measurable space, a complex measure 
à on .¥ is a countably additive complex-valued set function; that is, A 
=}; + ià2, where à; and A, are finite signed measures. 


(a) Define the total variation of à as 


H 


|A|(A) = sup >» JACE;)|: Fy,..., En 


i= | 


disjoint measurable subsets of A, n = 1,2,.. } 


Show that |A| is a measure on .¥. (The definition is consistent 


with the earlier notion of total variation of a signed measure; see 
Section 2.1, Problem 4.) 
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In the discussion below, 4’s, with various subscripts, denote arbi- 
trary measures (real signed measures or complex measures), and u 
denotes a nonnegative real measure. We define A < u in the usu- 
al way; if A € ¥ and u(A) = 0, then A(A) = 0. Define A, L A, iff 
[àil L |A2|. Establish the following results. 
(b) |Ay +Az2| < [Ay] + |A2|; |aA] = la| |A| for any complex number a. In 
particular if A = A; + iA is a complex measure, then 


JAI < [Ail + là2l; hence JAQ) < C9 by 2.1.3(b). 


(c) If à, L u and àz L p, then A; +à Lu. 

(d) Ifà <p, then |A| < u, and conversely. 

(e) If à; < u and àz L u, then à; L àz. 

© IfA & panda L u, then à =0. 

(g) If A is finite, then à « u iff lim, 4)_.9 A(A) = 0. 


2.3 APPLICATIONS TO REAL ANALYSIS 

We are going to apply the concepts of the previous section to some problems 
involving functions of a real variable. If [a, b] is a closed bounded interval of 
reals and f: [a,b] > R, f is said to be absolutely continuous iff for each 
€ > 0 there is a 5 > O such that for all positive integers n and all families 
(a;,5,),..., (an, Bn) of disjoint open subintervals of [a, b] of total length at 
most 5, we have 


N lf i) -— f(a) < e. 


i=] 


It is immediate that this property holds also for countably infinite families 
of disjoint open intervals of total length at most 6. It also follows from the 
definition that f is continuous. 

We can connect absolute continuity of functions with the earlier notion of 
absolute continuity of measures, as follows. 


2.3.1 Theorem. Suppose that F and G are distribution functions on [a, b], 
with corresponding (finite) Lebesgue-—Stieltjes measures u; and uz. Let 
f= F —G, t=, — in, so that u is a finite signed measure on .# [a, b], 
with u(x, y] = f(y) — f(x), x < y. If mis Lebesgue measure on .# [a, b], then 
u << m iff f is absolutely continuous. 


Proof. Assume u < m. If e > 0, by 2.2.5(b) and (e), there is a 5 > 0 such 
that m(A) < ô implies |u|(A) < £. Thus if (a1, b)),..., (an, bn) are disjoint 
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open intervals of total length at most ô, 


SFG) -— fal = X lu, bill 
40 i=] i=] 


< Y |ul(ai, bi] < |ul(A) < e. 


i=] 


(Note that u{b;} = 0 since u < m.) Therefore f is absolutely continuous. 

Now assume f absolutely continuous; if e > 0, choose 5 > 0 as in the def- 
inition of absolute continuity. If m(A) = 0, we must show that u(A) = 0. We 
use 1.4.11: 


m(A) = inf{m(V): V DA, V open}, 
u;(A) = inff; (V): V DA, V open}, i=1,2. 


(This problem assumes that the measures are defined on .# (IR) rather than 
Bla, b]. The easiest way out is to extend all measures to .# (IR) by assigning 
measure 0 to R — [a, b].) Since a finite intersection of open sets is open, we 
can find a decreasing sequence {V,,} of open sets such that uw(V,) > u(A) 
and m(V,,) > m(A) = 0. 

Choose n large enough so that m(V,,) < ô; if V, is the disjoint union of 
the open intervals (a;, b;),i = 1, 2,..., then |u(V,,)| < dy, lula: b;)|. But f 
is continuous, hence 


u{bi} = lim u(b; — 1/n, bj] = lim [ f(b) ~ f(b; — 1/n)] = 0. 
Therefore 


HVS > lula, bill = > fb) -— fa) < e. 


Since £ is arbitrary and u(V,,) > uA), we have w(A)=0. O 


If f: R— R, absolute continuity of f is defined exactly as above. If F 
and G are bounded distribution functions on R with corresponding Lebesgue— 
Stieltjes measures 4; and uz, and f = F — G, u = u — m [a finite signed 
measure on %# (R)], then f is absolutely continuous iff u is absolutely con- 
tinuous with respect to Lebesgue measure; the proof is the same as in 2.3.1. 

Any absolutely continuous function on [a,b] can be represented as the 
difference of two absolutely continuous increasing functions. We prove this 
in a sequence of steps. 
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In the discussion below, A’s, with various subscripts, denote arbi- 
trary measures (real signed measures or complex measures), and u 
denotes a nonnegative real measure. We define A < u in the usu- 
al way; if A € Z and u(A) = 0, then A(A) = 0. Define A, L A> iff 
àil L |A2|. Establish the following results. 
(b) |Ay +A2] < [Ag] + là2l;laà| = la| |A| for any complex number a. In 
particular if à = à; + iA, is a complex measure, then 


JA] < [Ail + là2l; hence JA[ (<2) << 00 by 2.1.3(b). 


(c) Tf A; L u and Az L u, then A; +A2 L u. 

(d) IfA < u, then |A| < u, and conversely. 

(e) IfA,; < wand Az L u, then A; L àz. 

(ff) fA < u andà Lu, then A = 0. 

(g) If A is finite, then A < u iff limyay.9 ACA) = 0. 


2.3 APPLICATIONS TO REAL ANALYSIS 

We are going to apply the concepts of the previous section to some problems 
involving functions of a real variable. If [a, b] is a closed bounded interval of 
reals and f: [a,b] — R, f is said to be absolutely continuous iff for each 
€ > Q there is a ô > O such that for all positive integers n and all families 
(ai, b1), ..., (an, Bn) of disjoint open subintervals of [a, b] of total length at 
most 5, we have 


S If Gi) — fa) < e. 


i=] 


It is immediate that this property holds also for countably infinite families 
of disjoint open intervals of total length at most 6. It also follows from the 
definition that f is continuous. 

We can connect absolute continuity of functions with the earlier notion of 
absolute continuity of measures, as follows. 


2.3.1 Theorem. Suppose that F and G are distribution functions on [a, b], 
with corresponding (finite) Lebesgue—Stieltjes measures 4; and u2. Let 
f =F-G, t= — ih, So that u is a finite signed measure on $ [a, b], 
with u(x, yl = f(y) — f(x), x < y. If mis Lebesgue measure on .#[a, b], then 
u < m iff f is absolutely continuous. 


Proof. Assume u < m. If e > 0, by 2.2.5(b) and (e), there is a ê > 0 such 
that m(A) < ô implies |u|(A) < e. Thus if (a), b1),..., (an, bn) are disjoint 
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open intervals of total length at most ô, 
S| fbi) — f@)| = X lula, bil 
i=] i=] 


< Y` julla: bil < Iul(A) < e. 


I=] 


(Note that u{b;} = 0 since u < m.) Therefore f is absolutely continuous. 

Now assume f absolutely continuous; if e > 0, choose ô > 0 as in the def- 
inition of absolute continuity. If m(A) = 0, we must show that u(A) = 0. We 
use 1.4.11: 


m(A) = inf{m(V): V DA, V open}, 
u;(A) = inff; (V): V DA, V open}, i= 1,2. 


(This problem assumes that the measures are defined on .% (IR) rather than 
B (a, b]. The easiest way out is to extend all measures to .4(R) by assigning 
measure 0 to R — [a, b].) Since a finite intersection of open sets is open, we 
can find a decreasing sequence {V,} of open sets such that u(V„) —> (A) 
and m(V,) > m(A) = 0. 

Choose n large enough so that m(V,) < ê; if V, is the disjoint union of 
the open intervals (a;, b;), i = 1, 2,..., then |u(V,,)| < 2; | (ai, b;)|. But f 
is continuous, hence 


u{bi} = lim u(bi — 1/n, bi] = im [f(b;) — f(b; — 1/n)] = 0. 
Therefore 


KVMS > lu, bll = > If) — fa) < e. 


Since € is arbitrary and u(V,) > mA), we have w(A) = 0. LU 


If f: R— R, absolute continuity of f is defined exactly as above. If F 
and G are bounded distribution functions on R with corresponding Lebesgue- 
Stieltjes measures jz; and uz, and f = F — G, u = ui — by [a finite signed 
measure on .#(R)], then f is absolutely continuous iff u is absolutely con- 
tinuous with respect to Lebesgue measure; the proof is the same as in 2.3.1. 

Any absolutely continuous function on [a,b] can be represented as the 
difference of two absolutely continuous increasing functions. We prove this 
in a sequence of Steps. 
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If f: [a,b] > Rand P: a=x< x; <- <x, = bisa partition of [a, b], 
define 


VP) = X fæ) — fæ). 


i= | 


The sup of V(P) over all partitions of [a, b] is called the variation of f on 
[a, b], written V ; (a, b), or simply V (a, b) if f is understood. We say that f is 
of bounded variation on [a, b] iff V (a, b) < œ. Ifa < c < b, a brief argument 
shows that V (a, b) = V(a, c) + V(c, b). 


2.3.2 Lemma. If f: [a,b] — R and f is absolutely continuous on [a, b], 
then f is of bounded variation on [a, b]. 


Proor. Pick any €e > 0, and let ô > 0 be chosen as in the definition of absolute 
continuity. If P is any partition of [a, b], there is a refinement Q of P consisting 
of subintervals of length less than 6/2. If Q: a=x <x; < +- <x, = b, let 
io = 0, and let i; be the largest integer such that x;, — x < ô; let i2 be the 
largest integer greater than 1; such that x; — x; < ô, and continue in this 
fashion until the process terminates, say with i, = n. Now X} — x;,_, > 8/2, 
k=1,2,...,r—1, by construction of Q; hence 


2(b — a) E 
— 


ô 
r-1)5 <b-a, SO r<1+ M. 


By absolute continuity, V (Q) < Me. But V (P) < V(Q) since the refining pro- 
cess can never decrease V: the result follows. O 


It is immediate that a monotone function F on [a, b] is of bounded variation: 
V (a, b) = | f(b) — f(@|. Thus if f = F — G, where F and G are increasing, 
then f is of bounded variation. The converse is also true. 


2.3.3 Lemma. If f: [a,b] — R and f is of bounded variation on [a, b], then 
there are increasing functions F and G on [a, b] such that f = F — G. If f is 
absolutely continuous, F and G may also be taken as absolutely continuous. 


Proor. Let f(x) =Vy(a,x), axx<b; F is increasing, for if h > 0, 
V(a,x+h)— V(a,x)=VG,x+h)> 0. If Ga) = F(x) — f(x), then G is 
also increasing. For if x; < x2, then 
Ga) — G(x) = F(x) — F(x.) — (fœ) — f@)) 
= V (x1, x2) — (f@2) — f)) 
> V(x, x2) — | fl) — f@1)| 
> 0 by definition of V (x1, x2). 
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Now assume f absolutely continuous. If ¢ > 0, choose 5 > 0 as in the defini- 
tion of absolute continuity. Let (a1, b1), ..., (dn, bn) be disjoint open intervals 
with total length at most ô. If P; is a partition of [a;, b;],i = 1, 2,..., n, then 


`N V(P;) <€ by absolute continuity of f. 
i=] 


Take the sup successively over P),..., P, to obtain 


5 V (aj, bi) < €; 


i=] 


in other words, 


N [F(b;) — F (a;)] < e. 


i= |] 


Therefore F is absolutely continuous. Since sums and differences of abso- 
lutely continuous functions are absolutely continuous, G is also absolutely 
continuous. L] 


We have seen that there is a close connection between absolute continuity 
and indefinite integrals, via the Radon—Nikodym theorem. The connection 
carries over to real analysis, as follows. 


2.3.4 Theorem. Let f: [a,b] > R. Then fis absolutely continuous on [a, b] 
iff f is an indefinite integral, that is, iff 


f(x) — fla) = f oa. axx<b, 


where g: [a, b] — R is Borel measurable and integrable with respect to Lebes- 
gue measure. 


Proof. First assume f absolutely continuous. By 2.3.3, it is sufficient to 
assume f increasing. If u is the Lebesgue—Stieltjes measure correspond- 
ing to f, and m is Lebesgue measure, then u « m by 2.3.1. By the Radon- 
Nikodym theorem, there is an m-integrable function g such that (A) 
= |, gdm for all Borel subsets A of [a,b]. Take A = [a,x] to obtain f(x) 


— fia) = fF g(t) dt. 
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Conversely, assume f(x) — f(a) = f ef dt. It is sufficient to assume g 
>0 (if not, consider g* and g` separately). Define (A) = |, gdm, 
A € [a,b]; then u « m, and if F is a distribution function corresponding 
to u, F is absolutely continuous by 2.3.1. But 


F(x) — Fla) =wa,x]= | side = fle) - fla) 
Therefore f is absolutely continuous. LI 


If g is Lebesgue integrable on R, the “if” part of the proof of 2.3.4 shows 
that the function defined by f ~ &(t) dt, x € R is absolutely continuous, hence 
continuous, on R. Another way of proving continuity is to observe that 


x+h x OO 
/ g(t) dt — J g(t) dt = f a(t) I cx xan (t) dt 


if h > 0, and this approaches 0 as h —> 0, by the dominated convergence 
theorem. 

If fix) - f~a= E e(t)dt, a <x < b, and g is continuous at x, then f 
is differentiable at x and f'(x) = g(x); the proof given in calculus carries 
over. If the continuity hypothesis is dropped, we can prove that f'(x) = g(x) 
for almost every x € [a, b]. One approach to this result is via the theory of 
differentiation of measures, which we now describe. 


2.3.5 Definition. For the remainder of this section, u is a signed measure on 
the Borel sets of R*, assumed finite on bounded sets; thus if u is nonnegative, 
it is a Lebesgue—Stieltjes measure. If m is Lebesgue measure, we define, for 
each x e R*, 


(Du)(x) = lim Sup mE)’ Dp)(x) = lim int mC.) 


where the C, range over all open cubes of diameter less than r that contain 
x. It will be convenient (although not essential) to assume that all cubes have 
edges parallel to the coordinate axes. 

We say that u is differentiable at x iff Du and Dy are equal and finite 
at x; we write (Du)(x) for the common value. Thus p is differentiable at x 
iff for every sequence {C,,} of open cubes containing x, with the diameter of 
C„ approaching 0, u(C,,)/m(C,,) approaches a finite limit, independent of the 
particular sequence. 


The following result will play an important role. 
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2.3.6 Lemma. If {C,,...,C,} is a family of open cubes in R*, there is a 
disjoint subfamily {C;,,..., C;,} such that m(U;_, C;) < 3% 1 MC;,). 


Proor. Assume that the diameter of C; decreases with i. Set i; = 1, and 
take i2 to be the smallest index greater than i, such that C;, is disjoint from 
C;,; let iz be the smallest index greater than iz such that C;, is disjoint from 
Ci U Ch. Continue in this fashion to obtain disjoint sets C;,,..., C;,. Now 
for any j =1,...,n, we have C; N C:, # for some ip < jJ, for if not, 
j is not one of the i,, hence i, < jJ <ip4, for some p (or i; < j). But 
C; AO (Ca U---UC;,) is assumed empty, contradicting the definition of 7,+1. 

If B, is the open cube with the same center as C;, and diameter three times 
as large, then since C;C;, #@ and diameter C; < diameter C;,, we have 
C; C Bp. Therefore, 


m Jc, <m (JB, < X mB) =3 X m(C;,). O 
j=l p=l 


We now prove the first differentiation result. 


2.3.7 Lemma. Let u be a Lebesgue—Stieltjes measure on the Borel sets of 
R*. If w(A) = 0, then Du = 0 ae. [m] on A. 


Proor. Ifa > 0, let B= {x €A: (Du)(x) > a}. [Note that {x: supç, K(C,)/ 
m(C,) > a} is open, and it follows that B is a Borel set.] Fix r > 0, and let K be 
a compact subset of B. If x € K, there is an open cube C, of diameter less than 
r with x € C, and u(C,) > am(C,). By compactness, K is covered by finitely 
many of the cubes, say C;,...,C,. If {C;,,..., Ci, } is the subcollection of 
2.3.6, we have 


n S k S 
m(K) <m | |] C; < # So mC,,) ZEC) 
, — 


j=l p=l 


3K |] 3K 
= — Ci, < — W(K,), 
a p=] a 


where K, = {x € R*: dist(x, K) < r}. Since r is arbitrary, we have m(K) 
< 3*u(K)/a < 3*u(A)/a =0. Take the sup over K to obtain, by 1.4.11, 
m(B) = 0, and since a is arbitrary, it follows that Du <0 ae. [m] on A. 
But u > 0; hence 0 < Du < Du, so that Du = 0 a.e. [m] on A. O 
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We are going to show that Du exists a.e. [m], and to do this the Lebesgue 
decomposition theorem is helpful. We write u = u1 + u2, where ui &« m, 
u Lm. If |u2|(4)=0 and m(A‘) = 0, then by 2.3.7, Du: = Dux =0 
a.e. [m] on A; hence a.e. [m] on R*. Thus Du: = 0 a.e. [m] on R*. 

By the Radon—Nikodym theorem, we have u: (E) = fp gdm, E € 8 (R5), 
for some Borel measurable function g. As might be expected intuitively, g is 
(a.e.) the derivative of u1; hence Du = g a.e. [m]. 


2.3.8 Theorem. Let u be a signed measure on B (RE) that is finite on bound- 
ed sets, and let u = ww, + u2, where yı < m and uz L m. Then Dy exists a.e. 
[m] and coincides a.e. [m] with the Radon—Nikodym derivative g = d u; /dm. 


Proor. If a € R and C is an open cube of diameter less than r, 


ws(C)—am(C) = | (g—aydm = | (¢ — a)dm. 


N{g>a} 


If ME) = feng>a (8 — a) dm, E € #(R*), and A = {g < a}, then (A) = 0; 
so by 2.3.7, DA = 0 a.e. [m] on A. But 


uO MC). 
mC) — m(C)’ 


hence Du: < a a.e. [m] on A. Therefore, if E, = {x € R*: g(x) < a < (Dui) 
(x)}, then m(E,) = 0. Since {Du > g} C U {Ea a rational}, we have Dy, 
< g ae. [m]. Replace u; by — u: and g by —e to obtain Du, > g ae. [m]. 
By 2.2.2(b), g is finite a.e. [m], and the result follows. LI 


We now return to functions on the real line. 


2.3.9 Theorem. Let f: [a,b] — R be an increasing function. Then the deriv- 
ative of f exists at almost every point of [a,b] (with respect to Lebesgue 
measure). Thus by 2.3.3, a function of bounded variation is differentiable almost 
every where. 


Proor. Since f has only countably many discontinuities, we may assume 
without loss of generality that f takes the upper value at a discontinuity and 
is therefore a distribution function. 

Let u be the Lebesgue—Stieltjes measure corresponding to f; by 2.3.8, 
Du exists a.e. [m], we show that Du = f’ ae. [m]. If a<x <b and u is 
differentiable at x, then f is continuous at x by definition of u. If limy_,o 
Lf(x +h) — f@)|/h 4 (Dw) (x) = c, there is an € > 0 and a sequence h, —> 0 
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with all A, of the same sign and |[f(x + hn) — f(x)]/h, — c| > e for all n. 
Assuming all A, > 0, we can find numbers k, > O such that 


fæ +n) fæ kn) 
h, + k,, 


c| > 


bo] © 


for all n, and since f has only countably many discontinuities, it may be 
assumed that f is continuous at x +h, and x —k,. Thus we conclude that 
W(x — kn, x +hy)/ Can + kr) c, a contradiction. U 


We now prove the main theorem on absolutely continuous functions. 


2.3.10 Theorem. Let f be absolutely continuous on [a,b], with f(x) 
— f(a) = f: g(t)dt, as in 2.3.4. Then f’ = g almost everywhere on [a, b] 
(Lebesgue measure). Thus by 2.3.4, f is absolutely continuous iff f is the 
integral of its derivative, that is, 


fù- flay= | foa, a<x<b. 


Proor. We may assume g > 0 (if not, consider g* and g`). If (A) 
= fi gdm,A € #(R*), then Du; = g ae. [m] by 2.3.8. Butifa <x < y < b, 
then uix, yl = f(y) — f(@), so that mı is the Lebesgue—Stieltjes measure 
corresponding to f. Thus by the proof of 2.3.9, Du; = f' ae. [m]. O 


Problems 


l. Let F be a bounded distribution function on R. Use the Lebesgue de- 
composition theorem to show that F may be represented uniquely (up 
to additive constants) as F, + Fz + F3, where the distribution functions 
F;, j = 1, 2,3 (and the corresponding Lebesgue—Stieltjes measures ju ;) 
have the following properties: 


(a) Fis discrete (that is, 2; is concentrated on a countable set of points). 

(b) > is absolutely continuous (uz is absolutely continuous with respect 
to Lebesgue measure; see 2.3.1). 

(c) F3 is continuous and singular (that is, 23 is singular with respect to 
Lebesgue measure). 


2. If f is an increasing function from [a,b] to R, show that f? f'(x)dx 
< f(b)— f(a). The inequality may be strict, as Problem 3 shows. (Note 
that by 2.3.9, f’ exists a.e.; for integration purposes, f’ may be defined 
arbitrarily on the exceptional set of Lebesgue measure 0.) 


3. (The Cantor function) Let E,, E», ...be the sets removed from [0, 1] to 
form the Cantor ternary set (see Problem 7, Section 1.4). Define functions 
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F,: [0, 1] — [0, 1] as follows: Let A,, Az, ...,Ag-_; be the subintervals 
of LJ., E;, arranged in increasing order. For example, if n = 3, 


EUER UE = (4,3) U (4.3) U 


Define 
F,,() = 0, 
Fix) =k/2" if x E€ Á}, k=1,2,...,2" — 1, 
F,(1) = 1. 


Complete the specification of F,, by interpolating linearly. For n = 2, see 
Fig. 2.3.1, in this case, 


Show that F, (x) — F(x) for each x, where F, the Cantor function, has 
the following properties: 
(a) F is continuous and increasing. 
(b) F’ =O almost everywhere (Lebesgue measure). 
(c) F is not absolutely continuous. 


F(x) 


1 
9 


Figure 2.3.1. Approximation to the Cantor function. 
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In fact 
(d) F is singular; that is, the corresponding Lebesgue- Stieltjes measure 
u is singular with respect to Lebesgue measure. 


4. Let f be a Lebesgue integrable real-valued function on R* (or on an open 
subset of R*). If u(E) = Je f(x)dx, E e #(R*), we know that Du = f 
a.e. (Lebesgue measure). If Du = f at xo, then if C is an open cube 
containing xọ and diam C — 0, we have u(C)/m(C) > f(xo); that is, 


A [tr — f(x)|dx > 0 as diam C — 0. 
mC) Je 


In fact, show that 


— | | f(x) — f(xo)| dx > 0 as diam C —> 0 
mC) Jc 


for almost every xo. The set of favorable xo is called the Lebesgue set 
of f. 
5. This problem relates various concepts discussed in Section 2.3. In all 


cases, f is a real-valued function defined on the closed bounded interval 
[a, b]. Establish the following: l 


(a) If f is continuous, f need not be of bounded variation. 

(b) If f is continuous and increasing (hence of bounded variation), f 
need not be absolutely continuous. 

(c) If f satisfies a Lipschitz condition, that is, | f(x) — f(y)| < Lix — y| 
for some fixed positive number L and all x, y € [a, b], then f is 
absolutely continuous. 

(d) If f’ exists everywhere and is bounded, f is absolutely continuous. 
(It can also be shown that if f’ exists everywhere and is Lebesgue 
integrable on [a, b], then f is absolutely continuous, see Titchmarsh, 
1939, p. 368.) 

(e) If f is continuous and f’ exists everywhere, f need not be absolutely 
continuous [consider f(x) = x? sin(1/x*), 0 < x < 1, f(0) = 0]. 


6. The following problem considers the change of variable formula in a mul- 
tiple integral. Throughout the problem, T will be a map from V onto W, 
where V and W are open subsets of R*, T is assumed one-to-one, contin- 
uously differentiable, with a nonzero Jacobian. Thus T has a continuously 
differentiable inverse, by the inverse function theorem of advanced cal- 
culus [see, for example, Apostol (1957, p. 144)]. It also follows from 
standard advanced calculus results that for all x € V, 


allt +h) — T(x) —A(@)h|] > 0 as h— 0, (1) 
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where A(x) is the linear transformation on R* represented by the Jacobian 
matrix of T, evaluated at x. [See Apostol (1957, p. 118).] 


(a) 


(b) 
(c) 


(d) 


(e) 


Let A be a nonsingular linear transformation on R*, and define a 
measure à on .4(R*) by A(E) = m(A(E)) where m is Lebesgue 
measure. Show that à = c(A)m for some constant c(A), and in fact 
c(A) is the absolute value of the determinant of A. [Use translation- 
invariance of Lebesgue measure (Problem 5, Section 1.4) and the 
fact that any matrix can be represented as a product of matrices 
corresponding to elementary row operations. | 

Now define a measure u on 2 (V) by u(E) = m(T(E)). By conti- 
nuity of T, if e > 0,x € V, and C is a sufficiently small open cube 
containing x, then 7(C) has diameter less than e, in particular, m(T(C)) 
< oo. It follows by a brief compactness argument that u is a Lebesgue— 
Stieltjes measure on $ (V). 

Our objective is to show that u is differentiable and (Du)(x) = 
IJ (x)| for every x € V, where J(x) = det A(x), the Jacobian of the 
transformation T. 
Show that it suffices to prove that if 0 € V and T (0) = 0, then (Diz) (0) 
= |detA(0)|. 
Show that it may be assumed without loss of generality that A(O) is 
the identity transformation; hence det A(O) = 1. 

Now given ¢ > 0, choose a € (0, 4) such that 


l—e < (1—20 < (14+ 2a) < 1+. 


Under the assumptions of (b) and (c), by Eq. (1), there is a ô> 0 
such that if |x| < 4, then |T (x) — x| < a |x| //k. 

If C is an open cube containing 0 with edge length £ and diameter 
/kB < 6, take C, Cz as open cubes concentric with C, with edge 
lengths £; = (1 — 2a) and B2 = (1 + 2æ)£. Establish the following: 


(i) Ifx eC, then T(x) € Cp. 

(ii) If x belongs to the boundary of C, then T(x) ¢ C}. 
(i1) If x is the center of C, then T(x) € C}. 
(iv) C,-—T(C)=C,-T(C). 
Use a connectedness argument to conclude that C; C T(C) C C3, 
and complete the proof that (Du)(0) = 1. 
If A is any measure on .4(V) and DA < œ on V, show that A is 
absolutely continuous with respect to Lebesgue measure. It therefore 
follows from Theorem 2.3.8 that 


mT(E)) = | |J (x)| dx, Ee #(y). 
F 
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[If this is false, find a compact set K and positive integers n and 
j such that m(K) = 0, A(K) > 0, and A(C) < nm(C) for all open 
cubes C containing a point of K and having diameter less than ł/J. 
Essentially, the idea is to cover K by such cubes and conclude that 
A(K) = 0, a contradiction. ] 

(f) If f is a real-valued Borel measurable function on W, show that 


f fQ) dy = | T(x) O| dx 
W V 


in the sense that if one of the two integrals exists, so does the other, 
and the two integrals are equal. 


7. (Fubini’s differentiation theorem) Let fi, f2,...be increasing functions 
from R to R, and assume that for each x, ret fn(x) converges to a 
finite number f(x). Show that XFL, f,’(x) = f'(x) almost everywhere 
(Lebesgue measure). 


Outline: 

(a) It suffices to restrict the domain of all functions to [0, 1] and to as- 
sume all functions nonnegative. Use Fatou’s lemma to show that 
Srey fn’) < f'(x) ae.; hence fa (x) > 0 a.e. 

(b) Choose n, nə2,...such that jen, fsx 2-* k=1,2,...and ap- 
ply part (a) to the functions g;(x) = f(x) -X fi@) =d2 5, £7 @)- 


2.4 LP Spaces 

If (Q,.¥, u) is a measure space and p is a real number with p > 1, the 
set of all Borel measurable functions f such that |f|? is u-integrable has 
many important properties. In order to fully develop these properties, it will 
be convenient to work with complex-valued functions. 


2.4.1 Definitions. Let (Q,.¥) be a measurable space, and let f be a com- 
plex-valued function on Q, so that f = Re f +i Im f. We say that f is a 
complex-valued Borel measurable function on (Q, 7 ) if both Re f and Im f 
are real-valued Borel measurable functions. If u is a measure on .¥, we define 


| fau=] Refdu+i f Im f du, 


provided fẹ Re fdu and jẹ Im fdu are both finite. In this case we say 
that f is y-integrable. Thus in working with complex-valued functions, we 
do not consider any cases in which integrals exist but are not finite. 

The following result was established earlier for real-valued f [see 1.5.9(c)]; 
it is still valid in the complex case, but the proof must be modified. 
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2.4.2 Lemma. If f is y-integrable, 


ffau < f fidu 


Proor. If fẹ fdu = re”, r> o0, then fae” fdu =r = | fo f dul. But if 
f(w) = p(w)e'”) (taking p > 0), then 


| e"f du — | pele) du 
2 2 


= f pcos(g — 0)du since r is real 
Q 


< | pdu= | fidu O 
Q2 Q 


Many other standard properties of the integral carry over to the complex 
case, in particular 1.5.5(b), 1.5.9(a) and (e), 1.6.1, 1.6.3, 1.6.4(b) and (c), 
1.6.5, 1.6.9, 1.6.10, and 1.7.1. In almost all cases, the result is an immediate 
consequence of the fact that integrating a complex-valued function is equiva- 
lent to integrating the real and imaginary parts separately. Only two theorems 
require additional comment. To prove that h is integrable iff |A| is integrable 
[1.6.4(b)], use the fact that |Re A|, |Im A| < |A| < |Re A| + |Im A|. Finally, to 
prove the dominated convergence theorem (1.6.9), apply the real version of 
the theorem to | f, — f|, and note that | fọ, — f| < | fal +/fI < 28. 

If p > 0, we define the space L? = L?(Q,.¥, u) as the collection of all 
complex-valued Borel measurable functions f such that fe |f|? du < co. We 


set 
1/ p 
IfI = (fit du) feb. 


It follows that for any complex number a, |laf||, = lal If llp, f E€ LP. 

We are going to show that L? forms a linear space over the complex field. 
The key steps in the proof are the Hölder and Minkowski inequalities, which 
we now develop. 


2,4.3 Lemma. If a,b,a, B > 0, «œ+ = 1, then a%bf < ga + pb. 


Proor. The statement to be proved is equivalent to —log(aa+ Bb) 
< æ(— loga) + (— log b), which holds because — log is convex. [If g has 
a nonnegative second derivative on the interval 7 C R, then g is convex on 
I, that is g(ax + By) < ag(x)+ Be(y), x, yel, a, B > 0, a+ B = 1. To see 
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this, assume x < y and write 

g(ax + By) — ag(x) — pe) = alg (œx + By) — g(x)] + Blg(ax + By) — g(y)] 
= aB(y — x)[g (u) — 2’(v)] for some u,v 


with x <u<ax+ By, x+ By <u < y. But g’(u) — g (v) <0 Since g’ is 
increasing on 7.] O 


2.4.4 Corollary. If c,d > 0, p, q > 1, (1/p) + (1) = 1, then cd < (c?/p) 
+ (d%/q). 


Proor. In 2.4.3, let œ = 1/p, B=1/q,a=c?, b=d?. L 


2.4.5  Hélder Inequality. Let 1 < p < œ, 1 < q < œ, (1/p) + d4) = 1. If 
f €L? and g € L9, then fg € L' and || fgl < If llpliglla. 


Proor. In 2.4.4, take c=|f(@)I/\lFllp, d = |g(@)I/llgll, (the inequality is 
immediate if || f||, or llgll, = 0). Then 


LORO) IAO sol? 
Iflplsle T pilfllp aleli ` 


integrate to obtain 


————_— =1.U 
| Flplsllg 


i 
< — — 
q 


l 
-+ 
p 


When p = q = 2, we obtain 


J \felau < fired fier du É 


and thus, using 2.4.2, we have the Cauchy—Schwarz inequality: If f and g € L’, 
then fg e L! and 


|f sadu < fired fiba) 


where g is the complex conjugate of g. (The reason for replacing g by g is 
to make the inequality agree with the Hilbert space result to be discussed in 
Chapter 3.) 
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2.4.6 Lemma. If a,b > 0, p > 1, then (a+b)? < 2? '@? + bP). 


Proof. Let A(x) = d[(a+x)? — 2? '(a? +x?) /dx = pla+x)?'— 2°73 
pxP-!; since p > tł, 


h(x) > 0 for a+x> 2x, that is, x <a, 
h(x) =0 at x= 4, 


h(x) < 0 for X> a. 
The maximum therefore occurs at x = a; hence 
(a+ by? — 277! (aP +b?) < ata)? — 277! (aP +a”) = 0. O 


2.4.7 Minkowski Inequality. If f,g € LP’ (1 < p < œ), then f + g € L? and 
If + ell < Ifl + llgllp. 


Proor. By 2.4.6, |f +el? < (dfl4+ lel)? <2? 'dfl’?+lgl?), hence f, 
g € LP’ implies f + g e LP. Now the inequality is clear when p = 1, so as- 
sume p > | and choose q such that (1/p) + (t/q) = 1. Then 


If+elP=lft+ellf+el' Iff +g Hel if +eP A) 
Now |f + gl?! e L49; for 


p—i p—i 
t/q  1-—i/p 


P, 


hence 


Jis + gl?" du = f If 4 g]? du < o0. 


Since f and g belong to L? and |f + g|?~' € L2, Hölder’s inequality implies 
that | f| |f +g|?~* and |g| |f + g\?"' € L', and 


1/q 
fir If +l?! du < |Ifllp [ar + i= au 
=f llp If + gll83, (2) 


J gl If + gl? du < Welly If +9127 3) 


By Eq. (1), lf +eall5 < (fll + llell dy + gll5/4). Since p— (p/q)=1, 
the result follows. O 
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By Minkowski’s inequality and the fact that |laf||, = lal || fll, for f ELP, 
LP (1 < p < œ) 1s a vector space over the complex field. Furthermore, there is 
a natural notion of distance in L?, by virtue of the fact that || ||, is a seminorm. 


2.4.8 Definitions and Comments. A seminorm on a vector space L (over 
the real or complex field) is a real-valued function || || on L, with the following 
properties: 


If ll = 9, 
laf || = lal Ilf |l for each scalar a; 
consequently, if f = 0, then || f || = 0. 
If + ell < If ll + lell 


(f and g are arbitrary elements of L). If || || is a seminorm with the additional 
property that || || = 0 implies f = 0, || || is said to be a norm. 


Now || ||p is a seminorm on L?’; the first two properties follow from the 
definition of || ||», and the last property is a consequence of Minkowski’s 
inequality. 

We can, in effect, change || ||, into a norm by passing to equivalence classes 
as follows. 

If f, g € Le (Q, F, u), define f ~g iff f = g ae. [u]. Then || fl, is the 
same for all f in a given equivalence class, by 1.6.5(b). Thus if L? is the 
collection of equivalence classes, L? becomes a linear space, and || ||p is a 
seminorm on L?. In fact || ||, is a norm, since || ||, = 0 implies f = 0 ae. 
[u], by 1.6.6(b). 

If || || is a seminorm on a vector space, we have a natural notion of distance: 
d(f, z) = |lf — gll. By definition of seminorm we have 


d(f, g) = 0, 
d(f,g)=0 if f =g, 
d(f,g)=d(g, f), 


d(f,h) < d(f, g8) +d(8, h). 


Thus d has all the properties of a metric, except that d (f, g) = 0 does not nec- 
essarily imply f = g; we call d a pseudometric. If || || is a norm, d is actually 
a metric. (There is an asymmetry of terminology between seminorm and pseu- 
dometric, but these terms seem to be most popular, although “pseudonorm”’ 
is sometimes used, as is “semimetric.”) 
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One of the first questions that arises in any metric space is the problem 
of completeness; we ask whether or not Cauchy sequences converge. We are 
going to show that the L? spaces are complete. The following result will be 
needed; students of probability are likely to recognize it immediately, but it 
appears in other parts of analysis as well. 


2.4.9 Chebyshev’s Inequality. Let f be a nonnegative, extended real-valued, 
Borel measurable function on (Q, Z, u). If 0 < p < œ and 0 < £ < œ, 


i 
mo: fozes | Pdu 
EP IQ 
The following version is often applied in probability. If g is an extended real- 
valued Borel measurable function on (Q2,.%_) and P is a probability measure 


on .¥, define 


m = | gdP (assumed finite, so that g is finite a.e. [P]), 
Q 


o* = fe — mý dP. 
2 


lfO<k<om, 
1 
P{w: |g(@) — m| > ko} < oe 


This follows from the first version with f = |g — ml, € = ko, p = 2. 
PROOF. 


f P du > f f? du > eP ufo: fw) > e}. O 
$ (Ææ: fiw)ze} 


One more auxiliary result will be needed. 


2.4.10 Lemma. If g,,82,... € LP (p > 0) and ||8k — &k+ıllp < ( 
2,..., then {g;} converges a.e. 


| 
S 
= 
| 
pi. 


Proor. Let Ay = {@: |g (@) — L11 (@)| > 27*}. Then by 2.4.9, 
(Ak) < 2"? Ilge — gerill? < 27. 


By 2.2.4, w(limsup,A,) = 0. But if w ¢ lim sup, An, then |g) — g;.;(@)| 
< 2-* for large k, so {g,(w)} is a Cauchy sequence of complex numbers, and 
therefore converges. L 


Now, the main result: 
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2.4.11 Completeness of LP, 1 <p < œ. If fi, fo,... form a Cauchy se- 
quence in L?, that is, || fn — fmllp —> 0 as n, m — œ, there is an f € L? such 
that tn _ Fill, > 0. 


Proor. Letn, be such that || fn — fmllp < 4 for n, m > nj, and let g1 = fn.. 
In general, having chosen g),..., 8g and m,..., mx, let nya) > ng be such 


that ||fn—fmllp < Ci for n,m > ng, and let gx.) = fn,- By 2.4.10, 
gą converges a.e. to a limit function f. 

Given ¢ > 0, choose N such that || fa — falli <€ for n, m> N. Fix 
n>N and let m —> œ through values in the subsequence, that is, let 
m = n, k > oo. Then 


e > liminf fa — full? = limint | Ifa — ful? du 
> J mintis, — rl? du by Fatou’s lemma 
=IIfn — fI. 
Thus || fn — flp > 0. Since f = f — fn + fn, we have fe LP. O 


2.4.12 Examples and Comments. Let Q be the positive integers; take Z 
as all subsets of Q, and let u be counting measure. A real-valued function 
on {2 may be represented as a sequence of real numbers; we write f = {ap, 
n = 1,2,...}. An integral on this space is really a sum [see Problem 1(a)]: 


OO 
f fdu=S a, 
2 n=] 


where the series is interpreted as X`, an” — 3->~, a,” if this is not of the 
form +00 — œ (if it is, the integral does not exist). Thus the following cases 
occur: 


(1) SO. ant = 00, p_a <. The series diverges to co and the 
integral is co. 

(2) >, an” < œ, jan = 00. The series diverges to —oo and the 
integral is —oo. 

(3) So an? < 00, 37°. an < 00. The series is absolutely convergent 
and the integral equals the sum of the series. 

(A) So ant = 00, X; an” = œ. The series is not absolutely conver- 
gent; it may or may not converge conditionally. Whether it does or not, the 
integral does not exist. Thus when summation is considered from the point 
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of view of Lebesgue integration theory, series that converge conditionally but 
not absolutely are ignored. 


If u is changed so that u{n} is a nonnegative number p,, not necessarily 
1 as in the case of counting measure, the same analysis shows that 


Oo 
| Fu =X Pras 
02 n=! 


where the series is interpreted as aa Panan” — ear PnQn. 
If f = {an,n = 1,2, ...} 1s a sequence of complex numbers and u is count- 
ing measure, 


oC OO OO 
| fu=5 am Rea, +i>_ Im ap: 
Q2 n=] n=] n=1 


the integral is defined provided $~”~_, |a,| < co. 

Now let Q be an arbitrary set, and take .¥ as all subsets of Q and u as 
counting measure. If f = (f(a), a € Q) is a nonnegative real-valued function 
on Q, then [Problem 1(b)] 


J fdu=¥X_ f@), (1) 


where the series is defined as sup{}),- r f(a): F C Q, F finite}. If f(a) > 0 
for uncountably many g, then for some 6 > 0 we have f(a) > ô for infinitely 
many g, so that $`, f(a) = oo. 

If the nonnegativity hypothesis is dropped, we apply the above results to 
f* and f~ to again obtain Eq. (1), where the series is interpreted as >>, f * (a) 
—>., f (a). If f is complex-valued, Eq. (1) still applies, with the series in- 
terpreted as 5), Re f(a) +i>_, Im f(a). The integral is defined provided 
Ea | fla)| < 00. 

The space L?(Q, .¥, u) will be denoted by /?(Q); it consists of all complex- 
valued functions (f(a), a € Q) such that f(a) = 0 for all but countably many 
a, and 


IFIP = > IF@I? < o. 


If Q is the set of positive integers, the space 1? (Q) will be denoted simply 
by P; it consists of all sequences f = {a,,} of complex numbers such that 


OO 
IfI =X lanl? < 00. 
n=] 
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It will be useful to state the Hölder and Minkowski inequalities for sums. If 
f €1?(Q) and g € 1?(Q), where 1 < p < œ, 1 <q < œ, (1/p) + (1/0) = 1, 
then fg € l! (Q) and 


J /p 
S_ I f@s(@)| < (£ ror) (£ gor) 


If f, g €1?(Q), 1 < p< œ, then f +g € 1?(Q) and 


1/q 


1/ l/p 


1/p p 
(Z rw + sa) < (Ziren) +(£ ror) 


As in 2.4.5, we obtain the Cauchy—Schwarz inequality for sums from the 
Hélder inequality. If f, g € 17(Q), then fg € l! (Q) and 


< (£ rar) (X go) 


If in the above discussion we replace Q by {1,2,..., n}, all convergence 
difficulties are eliminated, and all the spaces ¿P? (Q) coincide with C”. 

If 0< p <1, || lp is not a seminorm on L? (Q, 3, u). For let A and B 
be disjoint sets with a = u(A}) and b = u(B) assumed finite and positive. If 
f = la, g = Tp, then 


1/2 


| N fage) 


L/p L/p 
If + glip = ( fir + si du) — ( [ts + 1nd du) = (a +b)", 


IF Il, = aP, lell =b"P, 
But (a+ b)'/? > a!/P +b'P if a, b>0, 0< p< 1, since (a +x — æ — 
x’ is strictly increasing for r> 1, and has the value O when x = 0. Thus 


the triangle inequality fails. We can, however, describe convergence in L?, 
0 < p< |l, in the following way. We use the inequality 


(a+ bj)? <a’ +b, a,b > 0, 0<p<l, 


which is proved by considering (a + x)? — a? — x”. It follows that 


fif +sds | IFP du+ | igi du feel’, 2) 
Q2 02 Q2 


and therefore d(f, g) = J, |f — gl? du defines a pseudometric on L?. In fact 
the pseudometric is complete (every Cauchy sequence converges); for Eq. (2) 
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implies that if f, g € LP, then f +g € LP, so that the proof of 2.4.11 goes 
through. 

If Q is an interval of reals, # is the class of Borel sets of Q, and p is 
Lebesgue measure, the space L? (Q, Z, u) will be denoted by L? (Q). Thus, for 
example, L?[a, b] is the set of all complex-valued Borel measurable functions 
f on [a, b] such that 


b 
If, = f fœ dx < oo, 


If f is a complex-valued Borel measurable function on (Q, 7, u) and 
fi, fo,... E LP (Q, F, u), we say that the sequence {f,,} converges to f in 
L? iff || fa — fllp — 0, that is, iff fy lf, — fl? du —> 0 as n > œ. We use 

P 


the notation f, ——> f. In Section 2.5, we shall compare various types of 
convergence of sequences of measurable functions. We show now that any 
f € LP is an L?-limit of simple functions. 


2.4.13 Theorem. Let f € LP, 0 < p < oo. If e > 0, there is a simple func- 
tion g € LP such that || f — gllp < £; g can be chosen to be finite-valued and 
to satisfy |g| < |f|. Thus the finite-valued simple functions are dense in L?. 


Proor, This follows from 1.5.5(b) and 1.6.10. O 


If we specialize to functions on R” and Lebesgue—Stieltjes measures, we 
may obtain another basic approximation theorem. 


2.4.14 Theorem. Let f € L?(Q,.¥, 4), 0< p< oo, where Q = R", 7 
= IR"), and u is a Lebesgue—Stieltjes measure. If e > 0, there is a con- 
tinuous function g E€ LP (Q, .¥, u) such that |f — gllp < £; furthermore, g can 
be chosen so that sup |g| < sup |f|. Thus the continuous functions are dense 
in DP. 


Proor. By 2.4.13, it suffices to show that an indicator 74 in L? can be 
approximated in the L? sense by a continuous function with absolute value 
at most 1. Now 7, € L? means that (A) < co; hence by 1.4.11, there is a 
closed set C C A and an open set V D A such that w(V — C) < ¢?2°?. Let g 
be a continuous map of Q into [0, 1] with g = 1 on C and g = 0 on V“ (g 
exists by Urysohn’s lemma). Then 


f lI, — gl? du = J Ta — gl? du. 
Q (aFg) 
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But {I #2} C V — C and |I; — g| < 2; hence 

Is — gll2 < 2? u(V — C) < e”. 
Since g = g — Ia + Ia, we have g e LP. O 


If fi, fo, ... are continuous and f, converges to f in L?, it does not 
follow that f is continuous (see Problem 2). 


2.4.15 The Space L™. If we wish to define L? spaces for p = co, we must 
proceed differently. We define the essential supremum of the real-valued Borel 
measurable function g on (Q, 7; u) as 


ess supg = inf{c € R: u{w: e(@) > c} = 0}, 


that is, the smallest number c such that g < c ae. [u]. 
If f is a complex-valued Borel measurable function on (Q, .¥, u), we define 


IF lloo = ess sup|f |. 


The space L® (Q, 7, u) is the collection of all f such that || f |l < co. Thus 
f e L™ iff f is essentially bounded, that is, bounded outside a set of mea- 
sure Q. 

Now |f + gl < IfI+ lel < Illo + Ilglloo a.e., hence 


IF + 8lloo < If lloo + I8lloo. 


In particular, f, g € L® implies f + g € L”. The other properties of a semi- 
norm are easily checked. Thus L® is a vector space over the complex field, 
I lloo is a seminorm on L™, and becomes a norm if we pass to equivalence 
classes as before. 


If f, fi, f2,... E L” and || fa — f llo > 0, we write f, +f; we claim 
that: || fa — flloo — 0 iff there is a set A € ¥Y with u(A) = 0O such that fa > f 
uniformly on A‘. 

Assume ||f, — fllo — 0. Given a positive integer m, || f, — filo < 1/m 
for sufficiently large n; hence | f,,(@) — f(@)| < 1/m for almost every w, say 
for w ¢ Am, where (Am) = 0. If A= UP, Am, then u(A) = 0 and f, > f 
uniformly on A‘. Conversely, assume (A) = O and f, — f uniformly on A‘, 
Given ¢ > 0, | fa — f| < ¢ on A* for sufficiently large n, so that | fa — f| < € 
a.e. Thus || f, — fllo < £ for large enough n, and the result follows. 

An identical argument shows that {f,} is a Cauchy sequence in 
L™ (|| fn — fmlloo > 0 as n,m —> oc) iff there is a set A € ¥ with u(A) = 0 
and fa — fm — O uniformly on A‘. 
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It is immediate that the Hölder inequality still holds when p = 1, q = œ 
and we have shown above that the Minkowski inequality holds when p = oo. 

To show that L° is complete, let { f,,} be a Cauchy sequence in L™, and let A 
be a set of measure 0 such that f,,(@) — fim(@) — 0 uniformly for wm € A‘. But 
then f (œw) converges to a limit f(@) for each w € A‘, and the convergence 
is uniform on A‘. If we define f(m) = 0 for œ € A, we have f e L” and 
Í n > Í ' 

Theorem 2.4.13 holds also when p = oo. For if f is a function in L”, the 
standard approximating sequence {f,} of simple functions (see 1.5.5) con- 
verges to f uniformly, outside a set of measure 0. However, Theorem 2.4.14 
fails when p = co (see Problem 12). 

If Q is an arbitrary set, ¥ consists of all subsets of Q, and u is counting 
measure, then L® (Q, .¥, u) is the set of all bounded complex-valued functions 
f = (f(a), a € Q), denoted by (Q). The essential supremum is simply 
the supremum; in other words, || fll = sup{| f(|: a € Q}. If Q is the set 
of positive integers, [°° (Q) is the space of bounded sequences of complex 
numbers, denoted simply by /°. 


Problems 


1. (a) If f ={a,,n =1,2,...}, the a, are real or complex numbers, and 
jz is counting measure on subsets of the positive integers, show that 

fa f du = Sy, an, Where the sum is interpreted as in 2.4.12. 
(b) If f = (f(a), a € Q) is a real- or complex-valued function on the 
arbitrary set (2, and u is counting measure on subsets of Q, show 
that fo f du = 3°, f(a), where the sum is interpreted as in 2.4.12. 


2. Give an example of functions f, fi, f2, ...from R to [0, 1] such that 


(a) each f, is continuous on R, 

(b) f(x) converges to f(x) for all x, f% | fn(x) — f@)|? dx > 0 for 
every p € (0, œ), and 

(c) f is discontinuous at some point of R. 


3. Foreachn = 1,2,...,let f, = fal”, aS”, ...} be a sequence of complex 
numbers. 


(a) If the a,” are real and 0 < al” < at” for all k and n, show that 


lim ya = yi lim at” 
(n) 


Show that the same conclusion holds if the a, are complex, 
(n) 


lim„—o a, exists for each k, and lay” < b; for all k and n, where 


2.4 


10. 


11. 
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(b) Ifthe ay” are real and nonnegative, show that 


(c) If the at” are complex and S77, Yia la,” | < oo, show that 
Te EL a and YL, Ee a’? both converge to the same 
finite number. 

Show that there is equality in the Hélder inequality iff |f |P and |g]? are 

linearly dependent, that is, iff A| f |P = Bl|g|? a.e. for some constants A 

and B, not both 0. 


If f is a complex-valued j-integrable function, show that | J, f dul 
= Jo lfldu iff arg f is ae. constant on {w: f(w@) Æ 0}. 


Show that equality holds in the Cauchy—Schwarz inequality iff f and g 
are linearly dependent. 


(a) Ifl < p < œ, show that equality holds in the Minkowski inequality 
iff Af = Bg a.e. for some nonnegative constants A and B, not both 0. 
(b) What are the conditions for equality if p = 1? 


If0 <r<s < œ,and f € L (Q, Z, u), u finite, show that || ||, < kif Ils 
for some finite positive constant k. Thus L° C L” and L* convergence 
implies L” convergence. (We may take k = 1 if u is a probability measure.) 
Note that finiteness of u is essential here; if u is Lebesgue measure on 
# (R) and f(x) =1/x for x > 1, f(x) =0 for x < 1, then f € L” but 
f éL!. 

If u is finite, show that || f lp > || llo as p — oo. Give an example to 
show that this fails if u(Q) = oo, 


(Radon—Nikodym theorem, complex case) If u is a nonnegative, real 
measure, A a complex measure on (Q,.¥), and A < u, show that there 
is a complex-valued u-integrable function g such that (A) = f, gdu 
for all A € F. If h is another such function, g = A a.e. 


Show also that the Lebesgue decomposition theorem holds if A is a 
complex measure and u is ac-finite measure. (See Problem 6, Section 2.2, 
for properties of complex measures.) 


(a) Let f be a complex-valued j-integrable function, where jz is a 
nonnegative real measure. If S is a closed set of complex num- 
bers and [1/u(E)] J, f du € S for all measurable sets E such that 
u(E) > 0, show that f(@) € S$ for almost every œ. [If D is a closed 
disk with center at z and radius r, and D C S°, take E = f™'(D). 
Show that | f-(f — z)du| < ru(E), and conclude that u (E) = 0.] 
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(b) If A is a complex measure, then à < |A| by definition of |A|; hence 
by the Radon—Nikodym theorem, there is a |A|-integrable complex- 
valued function h such that A(£) = fi hd|,| for all E € ¥. Show 
that |A| = 1 ae. [|A]]. [Let A, = {@: |hk(w)| <r}, O< r< 1, and 
use the definition of |A| to show that |A| > 1 a.e. Use part (a) to 
show |A| < 1 ae.] 

(c) Let ube anonnegative real measure, g a complex-valued j-integrable 
function, and A(E) = fp gdu, E € F. If h = dd/d|A| as in part (b), 
show that |A|(E) = J, hg du. (Intuitively, hg du = hdd = hhd|| 
= |A|? d|A| = d|A|. Formally, show that fẹ fhd|r| = fo fedu if f 
is a bounded, complex-valued, Borel measurable function, and set 
f =hle.) 

(d) Under the hypothesis of (c), show that 


JA\(E) = J e| du for all EEF. 
E 


12. Give an example of a bounded real-valued function f on R such that there 
is no sequence of continuous functions f, such that |f — fallo > 0. 
Thus the continuous functions are not dense in L™ (R). 


2.5 CONVERGENCE OF SEQUENCES OF MEASURABLE FUNCTIONS 

In the previous section we introduced the notion of L? convergence; we are 
also familiar with convergence almost everywhere. We now consider other 
types of convergence and make comparisons. 

Let f, i, f2,... be complex-valued Borel measurable functions on 
(Q, F, u). We say that f,, —> f in measure (or in z-measure if we wish to em- 
phasize the dependence on u) iff for every ¢ > 0, u{æ: |fn(@) — f(@)| = e} 
—> 0 as n > œ. (Notation: f, —+ f) When u is a probability measure, 
the convergence is called convergence in probability. 

The first result shows that L? convergence is stronger than convergence in 
measure. 


Pp 
2.5.1 Theorem. lf f, fi, fo,... E L?(0 < p < œ), then, fn sf implies 


fn ——> f 
Proor, Apply Chebyshev’s inequality (2.4.9) to |f, — fi. O 
The same argument shows that if { fa} is a Cauchy sequence in LP, then {fn} 


is Cauchy in measure, that is, given £ > 0, u{w: |f,(@) — fn(@)| > e} > 0 
as n, m— oO, 
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If f, fis fo,... are complex-valued Borel measurable functions on 
(Q, F, u), we say that f, > f almost uniformly iff, given € > 0, there is 
a set A € ¥ such that u(A) < € and f, — f uniformly on AF. 

Almost uniform convergence is stronger than both a.e. convergence and 
convergence in measure, as we now prove. 


2.5.2 Theorem. If f, —> f almost uniformly, then fa —> f in measure and 
almost everywhere. 


Proor. Ife > 0, let f, — f uniformly on A‘, with (A) < e. If 8 > 0, then 
eventually |f — fl<ô on A‘, so {|f, —f| > 5} CA. Therefore 
Ulfn — f| > ô} < u(A) < £, proving convergence in measure. 

To prove almost everywhere convergence, choose, for each positive integer 
k, a set Ag with (Ax) < 1/k and f, —> f uniformly on Af. If B = (J Af, 
then f, > f on B and u(B°) = (NR, An) < u(Az) > 0 as k > œ. Thus 
u(B°) = 0 and the result follows. 1 


The converse to 2.5.2 does not hold in general, as we shall see in 2.5.6(c), 
but we do have the following result. 


2.5.3 Theorem. If {f,} is convergent in measure, there is a subsequence 
converging almost uniformly (in particular, a.e. and (of course) in measure) 


to the same limit function. 


Proof. First note that {f,,} is Cauchy in measure, because if | fn — fm| > €, 
then either |f, — f| > ¢/2 or |f — fm! > e/2. Thus 


Mtn — fml = E} Se {lfa fia} 


E 
> bo as n,m — ©, 


+ WIE fn =5 


Now for each positive integer k, choose a positive integer Ng such that 
Nxs; > Nx for all k and 


ulw: falo) — fm(@)|>2-"} <2 for n,m>Nj. 
Pick integers ng > Nx with ny < ng4i, K =1,2,... ; then if gk = fy,, 
ulw: grw) — g lol > 2-*} < 2™. 


Let Ay = {lex — gx41| = 2-“}, A = lim sup, Az. Then (A) = 0 by 2.2.4; but 
if œw ¢ A, then w € A; for only finitely many k; hence |e; (@) — g¢.;(w)| < 27" 
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for large k, and it follows that g; (œw) converges to a limit g(w). Since (A) = 0 
we have gg —> g a.e. 

If B, = Ux, Ar, then u(B,) < S772, (Ar) < € for large r. If w ¢ B,, then 
lelo) — grlo < 27%, K=r,r+1,r4+2,.... By the Weierstrass M-test, 
gk — g uniformly on B,, which proves almost uniform convergence. 

Now by hypothesis, we have f, —, f for some f, hence fn, + f. 
But by 2.5.2, fa, —, g as well, hence f = g a.e. (see Problem 1). Thus fp, 
converges almost uniformly to f, completing the proof. D 


There is a partial converse to 2.5.2, but before discussing this it will be 
convenient to look at a condition equivalent to a.e. convergence. 


2.5.4 Lemma. If u is finite, then fa —> f a.e. iff for every ô > 0, 


u (Üw fi(@) — Fo) > | >0 a n>. 


k=n 


Proof. Let Bys = {@: |fn(@) — f(@)| > 8}, Bs = lim sup, Bus = NP, UK, Bas: 
Now U? Bis | Bs; hence u(i, Bys) > (Bs) as n > œ by 1.2.7). 
Now 


(w: fr(@) +> fo) = J Bs 


b>0 


OO 
= 5 Bi jm since Bs C Bs, for 06; > 8) 
m=] 


Therefore, 


fa —> J ae. iff u(Bs) = 0 for all ô > 0 


OO 
itu (ÜB) >o fral 5>0. 0 
k=n 


2.5.5 Egoroff’s Theorem. If u is finite and f, > f a.e., then f, > f 
almost uniformly. Hence by 2.5.2, if u is finite, then almost everywhere 
convergence implies convergence in measure. 


Proor, It follows from 2.5.4 that given € > 0 and a positive integer j, for 
sufficiently large n = n(j), the set A; = Ug-, pl fe — f1 = 1/J} has measure 
less than ¢/2/. If A= 52, Aj, then (A) < X52; w(Aj) < £. Also, if 6 > 0 
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and jis chosen so that 1/7 < ê, we have, for any k > n(j) and œw € A‘ (hence 
w ¢A;), |fz(@) — f(@)| < 1/7 < 5. Thus fa — f uniformly on AS. O 


We now give some examples to illustrate the relations between the various 
types of convergence. In all cases, we assume that ¥ is the class of Borel sets 
and u is Lebesgue measure. 


2.5.6 Examples. (a) Let Q = [0, 1] and define 


1 
e" fo<x<-, 


tru) = n 


0 elsewhere. 


Then f, — 0 a.e., hence in measure by 2.5.5, But for each p € (0, co], fn 
fails to converge in L’. For if p < œ, 


1 
Ifall = f Ifn(x)|? dx = -—e? > CO, 
0 Fi 


and 
II fnlloo =e > œ. 


(b) Let Q = R, and define 


l 

f,(0) — f0 < x< e”, 
n = n 
0 


elsewhere. 


Then fa, — O uniformly on R, so that f, — > 0. It follows quickly that 
fn — 0 a.e. and in measure. But for each p € (0, co), fa fails to converge in 
LP, since || f,, j = pn Pe’ > œ, 


(c) Let Q = [0, co) and define 


| 
1 fn<x<n+-—, 
n 


fa) — 


0 elsewhere. 


Then fa, — 0 a.e. and in measure (as well as in LP, 0 < p < œ), but does 
not converge almost uniformly. For, if f, —> 0 uniformly on A and u (Af) < £, 
then eventually f, <1 on A; hence if A, = [n,n + (l/n)] we have 
A N Uk>n Ax = Ø for sufficiently large n. Therefore, A D ),., Ay, and con- 
sequently (AC) > X, (Ak) = œ, a contradiction. Note that if we change 
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fn(x) So that it is 1 for n < x < n + 1 and 0 elsewhere, then f, converges to 
0 almost everywhere but not in measure, hence not almost uniformly. 


(d) Let & = [0, 1], and define 


m— | 


1 if < X< —, m= 1,....n, n = 1,2,..., 


3/3 


Fam(x) — 


Q elsewhere. 


Then l famllh = 1/n — 0, so for each p € (0, œœ), the sequence fii, f2;, fo2, 
fz31, f32, f33, . . . converges to 0 in L? (hence converges in measure by 2.5.1). 
But the sequence does not converge a.e., hence by 2.5.2, does not con- 
verge almost uniformly. To see this, observe that for any x + 0, the sequence 
{ fum(x)} has infinitely many zeros and infinitely many ones. Thus the set on 
which faum converges has measure 0. Also, fam does not converge in L™, 


for if fum ~> f, then frm —+ f, hence f = 0 ae. (see Problem 1). But 
ll famlloo = 1, a contradiction. 


Problems 
1. If f, converges to both f and g in measure, show that f = g ae. 
2. Show that a sequence is Cauchy in measure iff it is convergent in measure. 


3. (a) If u is finite, show that L® convergence implies L? convergence for 
all p € (0, œ). 
(b) Show that any real-valued function in L?[a, b], -w < a < b < œ, 
0 < p < œ, can be approximated in L?” by a polynomial, in fact by 
a polynomial with rational coefficients. 


4. If u is finite, show that {f„} is Cauchy a.e. (for almost every w, { f,,(@)} 
is a Cauchy sequence) iff for every 5 > 0, 


u| U to: Hæ- filo) 23}] +0 as n> ow, 


j.k=n 


5. (Extension of the dominated convergence theorem) If |f,| < g for all 
n=1,2,..., where g is j-integrable, and f, —, f, show that f is 
ji-integrable and fe fn du > Jo f du. 

6. A metric may be defined on the space of all measurable functions on 


(Q,. 7, u) by d(f, g) = Jo ny du. (Functions that agree almost every- 
where are identified.) 
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(a) Ifd(fn, f) > 0, show that f, —— f. 
(b) If u is finite, show that f, ——> f implies d(f,, f) > 0. 


(c) Give an example in which f, > f but d(f,, f) does not ap- 
proach 0. 


7. If u is a finite measure and for every ¢ > 0, X72, P{| fa — f| = €} < œ, 
show that f, — f almost everywhere. Thus by Chebyshev’s inequality, 
Xna lfa — FIE < œ implies that f, > f almost everywhere. 


2.6 PropucT MEASURES AND FuBINPS THEOREM 

Lebesgue measure on R” is in a sense the product of n copies of one- 
dimensional Lebesgue measure, because the volume of an n-dimensional 
rectangular box is the product of the lengths of the sides. In this section we 
develop this idea in a general setting. We shall be interested in two construc- 
tions. First, suppose that (Q;, “%;, u;) is a measure space for j = 1, 2,..., 7. 
We wish to construct a measure on subsets of Q; x Q) x --- x Q, such that 
the measure of the “rectangle” A; x Az x --- x A, [with each A; € .¥;] is 
[Ly (A; )ft2 (A2) +++ Hnn). The second construction involves compound exper- 
iments in probability. Suppose that two observations are made, with the first 
observation resulting in a point œ € Q,, the second in a point œ € Q2. The 
probability that the first observation falls into the set A is, say, 44;(A). Further- 
more, if the first observation is w,, the probability that the second observation 
falls into B is, say, 4(@;, B), where 4(@,, +) is a probability measure defined 
on .¥> for each w, € Q,;. The probability that the first observation will belong 
to A and the second will belong to B should be given by 


WA x B) = J lor, Bym do, 


and we would like to construct a probability measure on subsets of Q; x Q2 
such that u (A x B) is given by this formula for each A € .¥; and B € %7. [In- 
tuitively, the probability that the first observation will fall near œ; is 4;(d@,); 
given that the first observation is w,, the second observation will fall in B 
with probability “(@,, B). Thus u(@;, Bju (dœ) represents the probability 
of one possible favorable outcome of the experiment. The total probability 
is found by adding the probabilities of favorable outcomes, in other words, 
by integrating over A. Reasoning of this type may not appear natural at this 
point, since we have not yet talked in detail about probability theory. How- 
ever, it may serve to indicate the motivation behind the theorems of this 
section. | 
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2.6.1 Definition. Let Z; be a o-field of subsets of Q;, j = 1,2,...,2, 
and let Q = Q; x Q: x--- x Q,. A measurable rectangle in Q is a set A 
= A, X Å X +- X An, where A; €.¥; for each j = 1,2,...,n. The small- 
est o-field containing the measurable rectangles is called the product o-field, 
written Fi x Fa x- x Fna. If all Z; coincide with a fixed o-field ¥, the 
product o-field is denoted by Z”. Note that in spite of the notation, Zi x .¥2 
x +++ X Fpa 1s not the Cartesian product of the .¥;; the Cartesian product is the 
set of measurable rectangles, while the product o-field is the minimal o-field 
over the measurable rectangles. Note also that the collection of finite disjoint 
unions of measurable rectangles forms a field (see Problem 1). 

The next theorem is stated in such a way that both constructions described 
above become special cases. 


2.6.2 Product Measure Theorem, Let (Q, i, u1) be a measure space, with 
i, o-finite on .¥;, and let Q; be a set with o-field 73. Assume that for each 
@, E Q; we are given a measure ulw, -) on 7z. Assume that 4(@, B), be- 
sides being a measure in B for each fixed œw; € Q,, is Borel measurable in , 
for each fixed B € .%,. Assume that the 4(@, -) are uniformly o-finite; that is, 
Q- can be written as Cina B,,, where for some positive (finite) constants k, we 
have .(@,, Ba) < kn for all w, € Q,. [The case in which the u(w;, -) are uni- 
formly bounded, that is, ~(@,, Q2) < k < œ for all @,, is of course included. |] 
Then there is a unique measure u on Z = Fi x a such that 


u(A x B) = f woo. Bju (dow) for all AER, Be Fy, 
A 


namely, 
u(F) = f ulo, Fou ldo) FEF, 
l 


where F (œw) denotes the section of F at @;: 
F(@) = {@ € Q: (@,, @2) E€ F}. 


Furthermore, u is o-finite on ¥; if u; and all the u(w;,:) are probability 
measures, sO 1S LL. 


Proor. First assume that the u(w;, -) are finite. 


(1) If C €.Y, then Clw) € ¥> for each w € Q;. 
To prove this, let @ = {C € ¥: C(@,) € ¥2}. Then @ is a o-field since 


Ü c, J@n =|] Calo),  C@) = (Clo). 


n=] n=l 
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If A €. ¥;, B € Fy, then (A x B)\(@,) = B if w, € A and Ø if œ ¢ A. Thus 7 
contains all measurable rectangles; hence 4 = -¥. 


(2) If C e F, then u(@i, C(@,)) is Borel measurable in œw). 
To prove this, let & be the class of sets in ¥Y for which the conclusion of 
(2) holds. If C = A x B, A € F, B € Fy, then 


ulw, B) if w; €E A, 


uw, C(@;)) = ve A)=0 if w, $ A. 


Thus u(w;, C(@,)) = ulw, B)I,4(@,), and is Borel measurable by hypothe- 
sis. Therefore measurable rectangles belong to @. If C,,...,C, are disjoint 
measurable rectangles, 


H (o Ú c) o) — Q © cio) = ` HW, Ci(w)) 


i=] i= | i=| 


is a finite sum of Borel measurable functions, and hence is Borel measurable in 
@,. Thus % contains the field of finite disjoint unions of measurable rectangles. 
But @ is a monotone class, for if C, € 6,n =1,2,..., and C, ¢ C, then 
C,(@;) t C(@;); hence “(@;, Cr(@1)) > uw, C(@,)). Thus ulw, C(@;)), 
a limit of measurable functions, is measurable in w,. If C, | C, the same 
conclusion holds since the u(@;, :) are finite. Thus 4 = Z: 


(3) Define 
MF) = flor, FoDmdo) Fer 
[the integral exists by (2)]. Then u is a measure on .¥, and 
u(A x B) = | ulw, B)u;(d@,) for all AEF), B EJ. 


To prove this, let F;, F2,... be disjoint sets in 7. Then 


«(U Fa) =J ufor LJ Faton mdo 


n=| n=l 


= | Somer, Faou (dor) 


n=] 


=o | mon F, (0D dor) = > uF) 
n=l l 


n=l 
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proving that u is a measure. Now 


u(A x B) = | ulon (A x B)(o,)) Hy (do) 


021 


_ f ulo BIa(@,)u1(day) [see (2)] 
— | ulo, Bju (do) 


as desired. 
Now assume the j4(@,, -) uniformly o-finite. Let Q, = Car B,, where the 
B,, are disjoint sets in .¥> and u (w, B,) < kn < œ for all œ € Q,. If we set 


ltn (@1, B) = u(@,, BOB, ), B E #3, 


the un’(@;, -) are finite, and the above construction gives a measure up’ on 
F such that 


Un'(A x B) = / un'(@1, Budo), ACF, BEF 
= / uoi, BAB u (de), 
A 


namely, 


ln (F) = f Mn (@1, F(@ Du (do) = | ulw, F(@,) NA Brui (do). 


] 


Let u = >>, ln’; u has the desired properties. 

For the uniqueness proof, assume u(@;, -) to be uniformly o-finite. If À 1s 
a Measure on .¥ such that A(A x B) = J ulw, Bju (dæ) for all A E %4, 
B € 2, then à = u on the field .%ọ of finite disjoint unions of measur- 
able rectangles. Now u is o-finite on o, for if Q2 = |J, B, with B, E F 
and 14(@, Ba) < kn < œ for all œ, and Q; = UZ; Am, where the Am belong 
to ¥, and Ui(Ám) < œ, then Q; x QR, = Um n= (Am x B,) and 


(Am x By) = j ulo, B ui(do) < kati (Am) < 00. 


m 


Thus A = u on F by the Carathéodory extension theorem. 
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We have just seen that yz is o-finite on o, hence on -¥. If u, and all the 
j4(@,, -) are probability measures, it is immediate that jz is also. LI 


2.6.3 Corollary: Classical Product Measure Theorem. Let (Q,, F, uj) 
be a measure space for j = 1, 2, with yu; o-finite on ¥;. If Q = Q; x Qa, 
Y= F, xX Fr, the set function given by 


u(F) = f m(F(01)) dui (01) = f u (F (02)) du (%2) 


2 


is the unique measure on .¥ such that u(A x B) = u (Au (B) for all A E 7, 
B € ¥,. Furthermore, u is o-finite on .¥, and is a probability measure if 
u; and u are. The measure y is called the product of u, and 2, written 


U = Hi X pr. 


Proor. In 2.6.2, take u(@,, +) = u for all w,. The second formula for u(F) 
is obtained by interchanging u; and u2. LI 


As a special case, Let Q] = Q: = R, Z, =.F%, = H (R), yı = u = Lebes- 
gue measure. Then .¥; x 73 = B (R?) (Problem 2), and u = u; X u agrees 
with Lebesgue measure on intervals (a, b] = (a, b;] x (a2, b2]. By the Cara- 
théodory extension theorem, u is Lebesgue measure on .4(R*), so we have 
another method of constructing two-dimensional Lebesgue measure. We shall 
generalize to n dimensions later in the section. 

The integration theory we have developed thus far includes the notion of a 
multiple integral on R”; this is simply an integral with respect to n-dimensional 
Lebesgue measure. However, in calculus, integrals of this type are evaluated by 
computing iterated integrals. The general theorem which justifies this process 
is Fubini’s theorem, which is a direct consequence of the product measure 
theorem. 


2.6.4 Fubini’s Theorem. Assume the hypothesis of the product measure 
theorem 2.6.2. Let f: (Q, A) > (R, #(R)). 


(a) If f is nonnegative, then To, f (@,, œ ulw, dwz) exists and defines 
a Borel measurable function of œ. Also 


| fdu = j J f (@,, œ uUlo, don) ) ui (da). 
Q (2) {29 


(b) If Je f du exists (respectively, is finite), then To, f (@1, œ ulw, dw) 
exists (respectively, is finite) for j4,;-almost every w,, and defines a Borel 
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measurable function of @, if it is taken as 0 (or as any Borel measurable 
function of @,) on the exceptional set. Also, 


[ru= | ( Fon, ouor don) ) [41 (d@}). 
Q Q21 (22 


[The notation To, f (@1, œu, da) indicates that for a fixed w,, the func- 
tion given by g(a.) = f (@), @2) 1s to be integrated with respect to the measure 


Uw, +). ] 


Proof. (a) First note that: 


(1) For each fixed œw, we have f(@),-): (Q2,.4%) > (R, #(R)). In 
other words, if f is jointly measurable, that is, measurable relative to the 
product o-field Fi x 7, it is measurable in each variable separately. For if 
Be BR), {o: f(@, @2) € B} = {w (@,@2) € f7'(B)} = ff (B)(@) 
E€ Fh by part (1) of the proof of 2.6.2. Thus fo, f (@1, 2)4(@1, dw) exists. 

Now let F, F € #, be an indicator. Then 


f teem der) = | I F(@)(@2) L(@1, dw) 
2 2 


= ulw, F(@))), 


and this is Borel measurable in œw, by part (2) of the proof of 2.6.2. Also 


f Irdu = u(F) = | (01, Fl@,))uy(do,) by 26.2 


l 


= J J [p(@), W@2)L(@1, dw) {| (da ). 


Now if f =} i= xXjIr,, the F; disjoint sets in .¥, is a nonnegative simple 
function, then 


|For ouo don = Y xino Fo), 


j=l 


Borel measurable in @,, 
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and 


[pau = Sx) iE] | tor. ouo, dou de) 
G2 j=l G2 j=l (2) (22 


by what we have proved for indicators 


= | J f(@1, œ )ulw, doui (da). 
Finally, if f: (Q, A > (R, #(R)), f > 0, letO < fa t f, fa simple. Then 
J f (@1, nulo, daz) = lim | fail, @2) U(@, da), 


which is Borel measurable in œw, and 


[pau = im | fdu= iim | | fio ouo dom do 
O H — 00 O He OO Q; Q- 
by what we have proved for simple functions 


— f f(@, wulo, d@2) 1; (dow) 
Q) JQ» 


using the monotone convergence theorem twice. 


This proves (a). 
(b) Suppose that f, fT du < oo. By (a), 


| f Flon oulo darj (da) = | fdu < o0 
2 J R Q 


so that Jo f (@, œ ulæ@ , d) is 4;-integrable, hence finite a.e. [u; ]; thus: 


(2) For u-almost every @, we may write: 
J f, 7) UW}, dW) — J fT(@, JUW, dw) 
2 2 
-f J (%1, wulo, da). 
2 


If fẹ f du is finite, both integrals on the right side of (2) are finite a.e. [41]. 
In any event, we may define all integrals in (2) to be 0O (or any other Borel 


108 2 FURTHER RESULTS IN MEASURE AND INTEGRATION THEORY 


measurable function of @,) on the exceptional set, and (2) will then be valid for 
all w,, and will define a Borel measurable function of w,. If we integrate (2) 
with respect to u, we obtain, by (a) and the additivity theorem for integrals, 


| | Fon oulon ddmd) = | ft du— | fdu 
N21 J 822 Q Q 


= | fdu D 


2.6.5 Corollary. If f: (Q, Z `)— (R, 2 (R)) and the iterated integral 


Jo Jo | f (@, w )lulw, dæ)ui(dw) < œ, then fa fdu is finite, and thus 
Fubini’s theorem applies. 


Proor. By 2.6.4(a), J, |f|du < co, and thus the hypothesis of 2.6.4(b) is 
satisfied. LI 


As a special case, we obtain the following classical result. 


2.6.6 Classical Fubini Theorem. Let Q = Qi x Qoy, F = Fi x Fo, 
u = H; X W, Where u; is a o-finite measure on ¥;, j= 1,2. If f is a 
Borel measurable function on (Q, F) such that f, f du exists, then 


[frau fe Ja f dadu 


= | | f du; dun by symmetry. 
Q JQy 
Proor. Apply 2.6.4 with w(@1, -) = mu for all œw. UO 


Note that by 2.6.5, if Jo Jo Ifldu2du: <œ (or fo Jo, Ifl dur duz 
< oo), the iterated integration formula 2.6.6 holds. 

In 2.6.4(b), if we wish to define Jo, f(@ @2){4(@;, da) in a completely 
arbitrary fashion on the exceptional set where the integral does not exist, 
and still produce a Borel measurable function of w, we should assume that 
(Qi, Fi, u1) is a complete measure space. The situation is as follows. We 
have h: (Q1, F1) > (R, @(R)), where h is the above integral, taken as 0 on 
the exceptional set A. We set g(w,;) = hwi), œw, é A; g(@;) = g(@,) arbitrary, 
w; € A (q not necessarily Borel measurable). If B is a Borel subset of R, then 


{@;: g(@;) € B} = [AN fay: h(@,) € BHUTAN {a : qlw) € BH. 
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The first set of the union belongs to 7, and the second is a subset of A, 
with u (A) = 0, and hence belongs to .%, by completeness. Thus g is Borel 
measurable. 

In the classical Fubini theorem, if we want to define to, f (@,, @2) duz (@2) 
and To f (@;, @2) dit; (@;) in a completely arbitrary fashion on the excep- 
tional sets, we should assume completeness of both spaces (Q2;,.¥%;, u1) and 
(Q2, F2, U2). 

The product measure theorem and Fubini’s theorem may be extended to n 
factors, as follows. 


2.6.7 Theorem. Let .¥; be a o-field of subsets of Q;, j = 1,..., n. Let wy 
be a o-finite measure on .¥;, and, for each (@,,...,@;) E Q; x+: x Qj, 
let 4(@,,..-,@;,B), B E ¥;4,, be a measure on Z; (J = 1,2,...,n — 1). 
Assume the 4(@),...,@j;, -) to be uniformly o-finite, and assume that (@, 
...,@;, C) is measurable: (Q; x «+- x Qj, Fi xX +++ x Fj) > (R, #(R)) for 
each fixed C € .¥j41. 

Let Q=Q,x---xXQ,,F=F,xX+++XFEy. 

(a) There is a unique measure u on .Y such that for each measurable 
rectangle A; x --- X Ay E.F, 


H(A X --- « An) = | mido | [L(@,, dw) 


| Mor, -ionn don) f UO, .--, Op}, dOn). 
Ån—l Ay 


[Note that the last factor on the right is w(@,;, ..., @n-1, An ).| The measure u 
is o-finite on .¥, and is probability measure if jz; and all the u(@,,..., @;, +) 
are probability measures. 


(b) Let f: (2,4) > (R, B (RY). If f > 0, then 


f fau= | mdo | wordo): | MO., On 2, dOn) 
Q Q, Qz pi 


| Flon... OUO., On, dy), (1) 
Qy 


where, after the integration with respect to u(@),...,@;,-) is performed 
(j=n-—1,n—2,...,1), the result is a Borel measurable function of 
(W, waa Wj). 

If fa f du exists (respectively, is finite), then Eq. (1) holds in the sense that 
for each j =n—1,n—2,..., 1, the integral with respect to w(@),..., @;, °) 
exists (respectively, is finite) except for (w1, ..., @;) in a set of à ;-measure 
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0, where A; is the measure determined [see (a)] by u, and the measures 
[(@1,-),..-, U(@,,...,@ 3-1, +). If the integral is defined on the exceptional 
set as 0 [or any Borel measurable function on the space (Q; x --- x Q;, 
Fy X- X ¥;)], it becomes Borel measurable in (@,..., @;). 


Proof. By 2.6.2 and 2.6.4, the result holds for n = 2. Assuming that (a) and 

(bì hold up to n-—J1 factors, we consider the n-dimensional case. 

By the induction hypothesis, there is a unique measure Àp- On Fy X I 
-x Fa—ı such that for all A; € Fi, ..., An-1 E Fn}; 


haii xex Ant) = | mdo) | ulo, de>) 
Å] Å? 


af UOI, -e @n-2, d@n-—1) 
An-l 


and 4,,-; 18 o-finite. By the n = 2 case, there is a unique measure u on 
(4) X +++ xX Fap) X Fa (which equals 7 x --- x Fa; see Problem 3) such 
that for each A E F, X +++ X Fy_1, An E Fn, 


u(A x Ay) — f wor, -24,@n—1, An) dAyn_1(@), . > @n—]) 
A 


=j ROTER ODT O, n—-|> Án) 
Qx xa l 

dd.n—\(@,.-.,@n_1): (2) 
If A is a measurable rectangle A, x --- x A,_;, then [,4(@,,...,@ny_1) 


= 14, (@;)-+-L4,_,(@n_1); thus (2) becomes, with the aid of the induction 
hypothesis on (b), 


UA, x- xX Ay) = f m (do,) 
Á] 


f UOI, . -c3 On—-1, An) M(@), ---, On_2, d@n—]) 
Ån—l 


which proves the existence of the desired measure u on .Y. To show that 
u is o-finite on .¥o, the field of finite disjoint unions of measurable rect- 
angles, and, consequently, u are unique, let Q; =U", Aj, J= 1,..., A, 
where p4(@),...,@;-1,Ajr) < ki <œ for all @,,...,@j)-), J =2,...,7, 
and Uili) = kiy < œ. Then 


- Ü (Aii X Áz, X X Ani, )s 


ils. aip =l 
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with 
UAir X Arig X -t X Ani, ) S Kyi Kai, +++ Kni, < ©. 


This proves (a). 
To prove (b), note that the measure u constructed in (a) is determined by 


Ààn—ı and the measures (@,,..., @n—1, +). Thus by the n = 2 case, 
[ fu=] f Fon oon ondon) 
G2 02) X- xa Cp 
din-\(@1, ces Wn—}) 
where the inner integral is Borel measurable in (@, ..., @,_1), or becomes so 


after adjustment on a set of A,,_;-measure 0. The desired result now follows 
by the induction hypothesis. UJ 


2.6.8 Comments. (a) If we take f= Ip in formula (1) of 2.6.7(b), we 
obtain an explicit formula for u(F), F € .¥, namely, 


u(F) = f (dw) f ulo, da) f Uo., On-2, dOn 1) 
£2; ¢2> Rna- 


f Tp(@1,...; Wn JUW, ees Onl dOn). 
Qn 


(b) We obtain the classical product measure and Fubini theorems by taking 
UOI... j) = tj, J= 1,2,..., n — 1 (with 341 o-finite). We obtain 
a unique measure u on .¥ such that on measurable rectangles, 


L(A, Xe X An) — [41 (A) )t2 (A2) : ** [hn (An). 


If f: (Q, F) > (R, #(R)) and f >0or fẹ fdu exists, then 


[tam f dnf dj fdu 


and by symmetry, the integration may be performed in any order. The mea- 
sure u is called the product of f4,,..., Un, Written u = u1 X +--+ X un. In 
particular, if each u; is Lebesgue measure on (R), then u4; X +- X Un is 
Lebesgue measure on .#(IR”), just as in the discussion after 2.6.3. 


Problems 


1. Show that the collection of finite disjoint unions of measurable rectan- 
gles in Q; x --- x Q, forms a field. 
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Show that #(R") = A(R) x- x #(R) (n times). 
f ¥;,...,.F,y are arbitrary o-fields, show that 


(Fi xX +++ X Fp1) X Fn =F XFX X Fy. 


Let u be the product of the o-finite measures u; and u2. If C E€ F x 
Fr, show that the following are equivalent: 

(a) pw(C)=0, 

(b) je2(C(@,)) = 0 for u;-almost all œ; € Q}, 

(c) y,(C(@2)) = 0 for -almost all œ € Qo. 

In Problem 4, let (Q’,.¥%’, w) be the completion of (Q,.¥, u), and 
assume Hı, {42 complete. If B e.Z’, show that B(w,) € Fa for u- 
almost all œw, € Q, [and B(w) € .¥,; for j42-almost all w € Q2]. Give 
an example in which B(@,) € .%2 for some œw; E€ Q}. 

(a) Let Q,; = Q, =the set of positive integers, 4%, =. = all sub- 
sets, 4, = 2 = counting measure, f(n,n) =n, f(n,n+1)=—n, 
n=1,2,...,f(i, j)=0 if j Fi or i+ 1. Show that fo Jo, f di 
du, =0, Jo, Jo, f du: duz = œ. (Fubini’s theorem fails since the 
integral of f with respect to 4; x u2 does not exist.) 

(b) Let Q; = Q, = R, .Y; = FY, = P (R), u, = Lebesgue measure, uz 
= counting measure. Let A = {(@,, @2): @ = an} E F) X Fah. 
Show that 


| | tadurdn = | [42(A(@1)) du (@1) = œ, 
Qi J 22 (21 


but 
| | tadurdia = | Hı (A(@2)) dti2(@2) = 0. 
m Ja, Q 


[Fubini’s theorem fails since uz is not o-finite; the product measure 
theorem fails also since To, Lo (F (@, )) di; (@,) and 


| u (F (@2)) duz (wy) do not agree on F, x F2.] 
(22 


Let Q; = Q, = the first uncountable ordinal, Z, = 7> = all subsets, 
OQ = Q x Qo, F =.F, x F. Assume the continuum hypothesis, which 
identifies (2, and Q- with [0, 1]. 


Rao, B. V., Bull. Amer. Math. Soc. 75, 614 (1969). 
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(a) If f is any function from Q, (or from a subset of Q,) to [0, 1] 
and G = {(x, y): x E€ Q, y= f (x)} is the graph of f, show that 
G EF. 

b) Le C ={@, VER: y<x}, CG={x, VE Q: y> x}. If BC 
C, or B C Ca, show that B e .¥. (The relation y < x refers to the 
ordering of y and x as ordinals, not as real numbers.) 

(c) Show that ¥ consists of all subsets of Q. 


8. Show that a measurable function of one variable is jointly measurable. 
Specifically, if g: (Qi, Z1) > (V, 7”) and we define f: Q, x Q 
—> Q by f(@;, @2) = g(w,), then f is measurable relative to .; x Fa 
and Z”, regardless of the nature of .F>. 


*9. Give an example of a function f: [0.1] x [0, 1] — [0, 1] such that 


(a) f(x, y) is Borel measurable in y for each fixed x and Borel mea- 
surable in x for each fixed y, 

(b) f is not jointly measurable, that is, f is not measurable relative to 
the product o-field .#[0, 1] x .4[0, 1], and 

(c) h ( h f (x, y) dy) dx and h ( h f (x, y)dx) dy exist but are unequal. 
(One example is suggested by Problem 7.) 


2.7 MEASURES ON INFINITE PRODUCT SPACES 

The n-dimensional product measure theorem formalizes the notion of an 
n-stage random experiment, where the probability of an event associated with 
the nth stage depends on the result of the first n — 1 trials. It will be convenient 
later to have a single probability space which is adequate to handle n-stage 
experiments for n arbitrarily large (not fixed in advance). Such a space can be 
constructed if the product measure theorem can be extended to infinitely many 
dimensions. Our task is to construct the product of infinitely many o-fields, 
and we first consider the countably infinite case. 


2.7.1 Definitions. For each j=1,2,..., let (Q;,.%;) be a measurable 
space. Let Q=]]j~, Q;, the set of all sequences (w,,@,...) such that 
wj E Qj, f=1,2,.... If Bc [j= Q;, we define 


B, = {@ E Q: (@1,...,@,) E B’}. 


The set B, is called the cylinder with base B”; the cylinder is said to be 
measurable if B” € ||,_,.¥;. If B" = A; x ++- x An, where A; C Q; for each 
i, B, is called a rectangle, a measurable rectangle if A; € ¥; for each i. 

A cylinder with an n-dimensional base may always be regarded as having 
a higher dimensional base. For example, if 


B = {w € Q: (w1, @, @3) € B°}, 
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Show that #(R"”) = #(R) x --- x &#(R) (n times). 
If ¥,,...,.¥%, are arbitrary o-fields, show that 


(Fy xX ++) xX Fy) X Fn =F XIX +X Fy. 


Let u be the product of the o-finite measures jz, and u2. If C € Fy x 


Fy, show that the following are equivalent: 


(a) pw(C) =0, 

(b) je2(C(@,)) = O for u,-almost all wm, € Q}, 

(c) py(C(@2)) = 0 for f2-almost all w € Q3. 

In Problem 4, let (Q’, ¥", w) be the completion of (Q, Z, wW), and 
assume {L;, {42 complete. If B €.¥’, show that B(w,) € ¥2 for u- 
almost all w, € Q, [and B(@)) E.F, for f2-almost all œ € Q2]. Give 
an example in which B(@,) ¢.¥%> for some @, E Q}. 


(a) Let Q, = Q, =the set of positive integers, 7; = z = all sub- 


sets, yı = u2 = counting measure, f(n,n) =n, f(n,n+1)=—n, 
n=1,2,...,fG, D =0if j #7 ori+ 1. Show that Jo, Jo, J due 
du, =0, fo, Jo, f dui du = œ. (Fubini’s theorem fails since the 
integral of f with respect to u, x uz does not exist.) 


(b) Let Q) = Q: = R, FY; = nı = P (R), u, = Lebesgue measure, wz 


= counting measure. Let A = {(@;, @2): @ = a} € F, X Pa. 
Show that 


f | hinan= | H(A (w )) du: (@ı) = œ, 
Qi V {25 PA 


but 
|_| tad dn = | Hı (A(@2)) dul) = 0. 
(25 2) (22 


[Fubini’s theorem fails since uz is not o-finite; the product measure 
theorem fails also since to, Li2(F (w;)) ds; (œ) and 


f u (F (@2)) du2(@2) do not agree on YF, x .F>.] 
(22 


Let Q, = Q, = the first uncountable ordinal, 7, = .¥~> = all subsets, 
OQ = Q, x Qo, F = FF) x. Fa. Assume the continuum hypothesis, which 
identifies Q, and Q, with [0, 1]. 


Rao, B. V., Bull. Amer. Math. Soc. 75, 614 (1969). 
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(a) If f is any function from Q, (or from a subset of Q) to [0, 1] 
and G = {(x, y): x € Qı, y = f(x)} is the graph of f, show that 
GEF. 

b) Le C; ={@, VEQ y<x}h C,={(%, VER y> x}. If BC 
C, or B C C2, show that B e .¥. (The relation y < x refers to the 
ordering of y and x as ordinals, not as real numbers.) 

(c) Show that . consists of all subsets of Q. 


8. Show that a measurable function of one variable is jointly measurable. 
Specifically, if g: (Qi, Z1) > (Q’,¥’) and we define f: Q; x Q 
—> QO’ by flw, a) = g(@,), then f is measurable relative to .¥; x 772 
and Z”, regardless of the nature of ¥3. 


*9, Give an example of a function f: [0.1] x [0, 1] — [0, 1] such that 


(a) f(x, y) is Borel measurable in y for each fixed x and Borel mea- 
surable in x for each fixed y, 

(b) f is not jointly measurable, that is, f is not measurable relative to 
the product o-field .#[0, 1] x .#[0, 1], and 

(c) h ( h f (x, y)dy)dx and h ( h f(x, y)dx)dy exist but are unequal. 
(One example is suggested by Problem 7.) 


2.7 MEASURES ON INFINITE Propuct SPACES 

The n-dimensional product measure theorem formalizes the notion of an 
n-stage random experiment, where the probability of an event associated with 
the nth stage depends on the result of the first n — 1 trials. It will be convenient 
later to have a single probability space which is adequate to handle n-stage 
experiments for n arbitrarily large (not fixed in advance). Such a space can be 
constructed if the product measure theorem can be extended to infinitely many 
dimensions. Our task is to construct the product of infinitely many o-fields, 
and we first consider the countably infinite case. 


2.7.1 Definitions. For each j=1,2,..., let (Q;,.4%;) be a measurable 
space. Let Q = Ife, Q;, the set of all sequences (w1, a ,...) such that 
w; E Q; j=1,2,.... Tf BPC ID= Q;, we define 


B, ={@ EQ: (@;,...,@,) E B"}. 


The set B, is called the cylinder with base B”; the cylinder is said to be 
measurable if B” € ||,_, 7}. If B” =A, x +++ x An, where A; C Q; for each 
i, B, 1s called a rectangle, a measurable rectangle if A; € Z; for each i. 

A cylinder with an n-dimensional base may always be regarded as having 
a higher dimensional base. For example, if 


B= {we Q: (w1, @2, @3) € B°}, 
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then 


B = {w E Q: (@, w, @3) € BY, @4 € Q4} 


= {w E Q: (1, Gs, @3, @4) € B? x Q24}. 


It follows that the measurable cylinders form a field. It is also true that finite 
disjoint unions of measurable rectangles form a field; the argument is the same 
as in Problem 1 of Section 2.6. 

The minimal o-field over the measurable cylinders is called the product of 
the o-fields ¥;, written [[52,.%);[[52,.4; is also the minimal o-field over 
the measurable rectangles (see Problem 1). If all .%; coincide with a fixed 
o-field .¥, then IPERE is denoted by .¥°, and if all Q; coincide with a 
fixed set S, J [{2, Q; is denoted by S°. 

The infinite-dimensional version of the product measure theorem will be 
used only for probability measures, and is therefore stated in that context. (In 
fact the construction to be described below runs into trouble for nonprobability 
measures. ) 


2.7.2 Theorem. Let (Q;,.4%;), j = 1,2,..., be arbitrary measurable spaces; 
let Q = J [2 Q;,.F = [[32, F;. Suppose that we are given an arbitrary prob- 


ability measure P, on .%,, and for each j = 1,2,... and each (a@;,...,@;) 
EQ, x +--+» x Q; we are given a probability measure P(w, „ees @j,*) on 
Fj4,. Assume that P(w,,...,@;,C) is measurable: ([];_, 2;, Ii, F) 


— (R, #(R)) for each fixed C E€ .¥;4,. 
If B" € ] [j= 7; define 


PB") = | Pido) | Peon don)... | P(@1,..., @n—-2, d@n-—1) 
£2) N2 Q 


n— | 


| I pn (Q@,, . ., On PO, s.. ,@n—1, dOn). 
Ky 


Note that P„ is a probability measure on |],_,.%; by 2.6.7 and 2.6.8(a). 

There is a unique probability measure P on .¥ such that for all n, P agrees 
with P, on n-dimensional cylinders, that is, P{w E€ Q: (@),...,@,) E B"} 
= P,,(B") for all n = 1,2,... and all B” € b= 7. 


Proof. Any measurable cylinder can be represented in the form B, 
= {w E Q: (@,...,@n) E B"} for some n and some B” € |[;_,.%;; define 
P(B,,) = P,(B"). We must show that P is well-defined on measurable cylin- 
ders. For suppose that B, can also be expressed as {w E€ Q: (@),...,@m) 
€C™} where C” e | [7j we must show that P,(B") = Pm(C™). Say 
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m<«n; then (@1,...,@,)€C” iff (@;,...,@,)¢€B", hence B” = C” 
X Qm X- XQ,. It follows from the definition of P, that P,(B") 
= P»(C™). (The fact that the P(@,,...,@,,-) are probability measures is 
used here.) 


Since P, is a measure on [|_| .¥;, it is immediate that P is finitely additive 
on the field .%9 of measurable cylinders. If we can show that P is continuous 
from above at the empty set, 1.2.8(b) implies that P is countably additive 
on .%o, and the Carathéodory extension theorem extends P to a probability 
measure on [[°",.¥%); by construction, P agrees with P, on n-dimensional 
cylinders. 

Let {B,, n = n4, M2, ...} be a sequence of measurable cylinders decreasing 
to @ (we may assume n; < nz < ---, and in fact nothing is lost if we take 
n; =i for all i). Assume lim, _,.. P(B,,) > 0. Then for each n > 1, 


PB.) = | g(a Pi (da), 
1 
where 


won = | Pordon): | Ign (@1, -03 On )P(@1, «+, On—1, dOn). 
h Gn 


Since B+; C B,, it follows that B”+! C B" x Q,,41; hence 
Igan @}, s.a Wn) < I pn (@1, e.’ Wn ). 
Therefore gP (@,) decreases as n increases (w, fixed); say gP(@,) 


— h,(@,). By the extended monotone convergence theorem (or the dominated 
convergence theorem), P(B,) > Jo, hi (@;)P;(dw;). If lim, P(B,) > 0, 
then h (w;)>0 for some œw; EQ, In fact œ; €B', for if not, 
Ig (@ @2,.-., @,) = 0 for all n; hence gw’) = 0 for all n, and h,(@,’) 
= 0, a contradiction. 

Now for each n > 2, 


or) = | g” (aP, day), 
§22 


where 


g2 (o) = | Pœ’, 09, das) 
623 


| Tn (wy, D, -e nP (O, -ep On- dOn). 
2y 
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As above, g (wz) | h2(@2); hence 
gn (@') > | hy (@2)P(@,", dan). 
629 


Since g (w) > hi (w) > 0, we have h2(a.’) > 0 for some az’ € Q2, and 
as above we have (@,’, w’) € B’. 

The process may be repeated inductively to obtain points w,’, @7’,... such 
that for each n, (@;’,..., @n’) € B”. But then (@;’, @2’,...) € DFL, By =, a 
contradiction. This proves the existence of the desired probability measure P. If 
Q is another such probability measure, then P = Q on measurable cylinders, 
hence P = Q on ¥ by the uniqueness part of the Carathéodory extension 
theorem. L 


The classical product measure theorem extends as follows: 


2.7.3 Corollary. For each j = 1, 2, ..., let (Q;, Z;, P;) be an arbitrary 
probability space. Let Q = a, Q;, F= IVa, . ;. There is a unique prob- 
ability measure P on .¥ such that 


Pl E Q: w, E€ Aj, ...,@n E€ An} = | [ Pj) 


j=l 


for all n = 1,2,... and all A; € Z;, j = 1,2,.... We call P the product of 
the P;, and write P = J [$2 P; 


Proof. In 2.7.2, take P(@i,..., Wj, B) = P541(B), Be Fiti- Then P,, (A, 
x +++ x Ån) = []j_; P;(A;), and thus the probability measure P of 2.7.2 has 
the desired properties. If Q is another such probability measure, then P = Q 
on the field of finite disjoint unions of measurable rectangles; hence P = Q 
on .¥ by the Carathéodory extension theorem. CI 


Thus far we have considered probability measures on countably infinite 
product spaces. The results may be extended to uncountable products if cer- 
tain assumptions are made about the individual factor spaces. We will be 
completely general at the beginning, but when we reach the main result, the 
Kolmogorov extension theorem, we will assume that all the factor spaces are 
the reals, with the o-field of Borel sets. 


2.7.4 Definitions and Comments. For t in the arbitrary index set T, let 
(Q, -7,) be a measurable space. Let [],., Q, be the set of all functions 
w = (w(t), t € T)onT such that w(t) € Q, for each t € T. Ift;,...,t, E€ T and 
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B" C [L Qn, we define the set B"(t;,..., tn) as {w € [Jer Q: (@(t)),..., 
W(t,)) € B}. We call B’(t,,..., tn) the cylinder with base B” at (t,,..., ty); 
the cylinder is said to be measurable iff B" € |],_, F.. If B” = By, x+- x Bu, 
the cylinder is called a rectangle, a measurable rectangle iff Bi € F,, 
i=1,...,n.Note thatifall Q, = Q, then [],-7 Q; = Q’, the set of all functions 
from T to &. 

For example, let T = [0,1], Q, = R for all ż € T, B? = {(u, v): u > 3, 
l < v< 2}. Then 


B (5. 5) = {x€ R": x(4) > 3, 1 < x(ż) <2} 


[see Fig. 2.7.1, where x; € B? (5, =) and x ¢ B’ (5, =) |. 

Exactly as in 2.7.1, the measurable cylinders form a field, as do the fi- 
nite disjoint unions of measurable rectangles. The minimal o-field over the 
measurable cylinders is denoted by |],-7.¥;, and called the product of the 
o-fields ¥;. If Q, = S and ¥, = Z for all t, [],-7-% is denoted by 7“. 
Again as in 2.7.1, [|,-7.¥%; is also the minimal o-field over the measurable 
rectangles. 

We now consider the problem of constructing probability measures on 
ller F: The approach will be as follows: Let v= {t),...,f,} be a finite 
subset of T, where t; < to <--- <t,. (If T is not a subset of R, some fixed 
total ordering is put on 7.) Assume that for each such v we are given a 
probability measure P, on |[;-; Zn; P,(B) is to represent P{w € J [er Q: 
(w(t)),..., @(t,)) € B}. We shall require that the P, be “consistent”; to see 
what kind of consistency is needed, consider an example. 

Suppose T is the set of positive integers and Q, = R, F, = (R) 
for all t. Suppose we know P\9345(B°) = Plo: (W), @, W3, W4, 5) E B>} for 


Figure 2.7.1. 
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all B? c .#(IR°). Then P{@: (a, @3) € B*} = Plo: (@;, a2, @3, 4,05) E R 
x B2 x R?} = Piows(R x B? x R*), B? € 2 (R2). Thus once probabilities of 
sets involving the first five coordinates are specified, probabilities of sets in- 
volving (œ, 3) [as well as (@;, 3, @4), and so on], are determined. Thus the 
original specification of P23 must agree with the measure induced from P2345. 
We hope that a consistent family of probability measures P, will determine a 
unique probability measure on | [er 5. 

Now to formalize: If v = {t),..., tn}, ti < +++ < tn, the space ([[;_, Q, 
Ili, Fn) is denoted by (Q,,.¥,). If u = {t;,..., tiz} is a nonempty subset 
of v and y = (y(t,),..., Yn) E Qa, the k-tuple (y(t), ..., Y(tiz)) is denoted 
by y,. Similarly if œ = (w(t), t € T) belongs to [[,-7 Qr, the notation œ, will 
be used for (w(f;),..., @(t,)). If B E€ F., the measurable cylinder with base 
B will be written as Biv). 

If P, is a probability measure on .¥,, the projection of P, on ¥,, is the 
probability measure z,,(P,,) on .¥, defined by 


[u (Po)](B) = Poly E &,: Yu € B}, Be Fy. 


Similarly, if Q is a probability measure on |]|,-7.¥%;, the projection of Q on 
Fy is defined by 


[z.(Q)](B) = Q fo Ee |[2: o E 7 =QBv)), BEF. 


tet 


Our main result, the Kolmogorov extension theorem, can be proved when 
each Q, is a complete, separable metric space, with .¥; the class of Borel sets 
(the o-field generated by the open sets). However, to avoid serious technical 
complications, we will take all Q, to be R and F; = B (R). 


2.7.5 Kolmogorov Extension Theorem. For each t in the arbitrary index set 
T, let Q, = R and F, the Borel sets of R. 

Assume that for each finite nonempty subset v of T, we are given a proba- 
bility measure P, on .¥,. Assume the P, are consistent, that is, 7,(P,) = P, 
for each nonempty u C v. 

Then there is a unique probability measure P on ¥ = [],-7.¥; such that 
T(P) = P, for all v. 


Proor. We define the hoped-for measure on measurable cylinders by P(B"(v)) 
= P(B"), B’ E€ F,,. 

We must show that this definition makes sense since a given measur- 
able cylinder can be represented in several ways. For example, suppose that 
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B? = (—oo, 3) x (4,5). Then 
B (ty, to) = {w olt) <3, 4< (tr) <5} 
= fæ: w(t) < 3, 4<a@(h) <5, æft) ER} 


= B? (ti, to, t3) where B? = (—co, 3) x (4,5) x R. 


It is sufficient to consider dual representation of the same measurable cylinder 
in the form B” (v) = B*(u) where k < n and u C v. But then 
P,,(B*) = [2,(P,)|(B*) by the consistency hypothesis 
= P {y E€ Q: Yu E BÝ} by definition of projection. 
But the assumption B” (v) = B* (u) implies that if y € Q,, then y € B” iff 
Yu € B*, hence P,,(B*) = P,(B"), as desired. 


Thus, P is well-defined on measurable cylinders; the class .~9 of measurable 
cylinders forms a field, and o(¥%o) = F. 


Now if A,,..., Am are disjoint sets in o, we may write (by introducing extra 
factors as in the above example) A; = B? (v) i=1,...,m, where 
v= {t),...,¢n} is fixed and the B’, i = 1,..., m, are disjoint sets in .Y,,. Thus 


(e)re 


mi 
= P, Ú Br) by definition of P 
i=] 
= X P,(BY) since P, is a measure, and 
i=] 


= ` P(A;) again by definition of P. 
i= | 


Therefore P is finitely additive on .%9. To show that P is countably additive on 
Fo, we must verify that P is continuous from above at Ø and invoke 1.2.8(b). 
The Carathéodory extension theorem (1.3.10) then extends P to ¥. 

Let Ay, k = 1, 2,... be a sequence of measurable cylinders decreasing to 
4. If P(A;) does not approach 0, we have, for some € > 0, P(A) > € > 0 for 
all k. Suppose A; = B™ (vg); by tacking on extra factors, we may assume that 
the numbers n; and the sets v} increase with k. 

It follows from 1.4.11 that we can find a compact set C”™ C B”™ such 
that P, (B"* — C™) < e/2kth Define Aj’ = C™ (v) C Ay. Then P(A; — Ax’) 
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= P,, (B"* —C") < ¢/2**', In this way we approximate the given cylinders 
by cylinders with compact bases. 
Now take 


Dy =A NAN. MAR CA NAN- MAR = Ag. 


Then 


k 
< POPA -AN < S02 < 5. 
i= | 


i= | 


Since Dg C Ax’ C Az, P(A; — Dy) = P(A) — P(D;), consequently P(D,) 
> P(A;) — é/2. In particular, Dg 1s not empty. 

Now pick x* € D}, k=1,2,.... Let Ay’ =C"™(ty,...,tn,) = C" (v) 
[note all Dg C A,’]. Consider the sequence 


! 2 2 3 3 
CORE , CORE , CEREA betes 


that is, xi X55 X, ees 

Since the xy, belong to C”!, a compact subset of Q,,, we have a convergent 
subsequence x7!" approaching some x,, € C”!. If Ax’ = C™ (v) (So Dy C Ad’ 
for k > 2), consider the sequence x!!!, x7, x7, ... € C™ (eventually), and 
extract a convergent subsequence x > x, E C™, 

Note that (xu, = xi"; as n > OO, the left side approaches (x,,),,, and 
as {r2,,} is a subsequence of {rin}, the right side approaches x,,. Hence (x,, Jv, 
= Xy. 


Continue in this fashion; at step i we have a subsequence 


x > Xy E C", and (Xv, Jo; = Xv, for j<i. 


Pick any w € | [er Q: such that œ, = x,, for all j = 1, 2, . . . (sucha choice 
is possible since (x,,),, = Xy,, J < i). Then w,, € C™ for each j; hence 


OO OO 
wef A; CA; =A, 
j=l j=l 


a contradiction. Thus P extends to a measure on .¥, and by construction, 
T(P) = P, for all v. 


2.8 WEAK CONVERGENCE OF MEASURES 121 


Finally, if P and Q are two probability measures on .¥ such that z,,(P) 
= 7,(Q) for all finite v C T, then for any B” E sy, 


P(B"(w)) = [bra (P)1(B") = [7,(Q)](B") = Q(B")). 


Thus P and Q agree on measurable cylinders, and hence on .¥ by the unique- 
ness part of the Carathéodory extension theorem. LI 


Problems 
1. Show that IT, F; is the minimal o-field over the measurable rectangles. 


2. Let Z= (R); show that the following sets belong to ¥™: 

(a) {xe R”: sup, x, <a}, 

b) {xe R”: Yre al< a}, 

(c) {xe R”: liM Xn exists and is finite}, 

(d) {xe R: lim sup, x, < a}, and 

(e) {xe R”: ` zx = 0 for at least one n > 0}. 

3. Let ¥ be a o-field of subsets of a set S, and assume .¥ is countably 
generated, that is, there is a sequence of sets A;, Az, ...in.¥ such that the 
smallest o-field containing the A; is Z. Show that Z °° is also countably 
generated. In particular, Æ (R) is countably generated; take the A; as 
intervals with rational endpoints. 


*4. How many sets are there in .# (IR)? 
5. Define f: R® > R as follows: 


the smallest positive integer n 
such that x, + -+ Xp > 1, 

if such an n exists, 

oo if x, +---4+x, < 1 forall n. 


J1, X2, ...) = 


Show that f: (R®, B (R)®) > (R, #(R)). 


2.8 WEAK CONVERGENCE OF MEASURES 

In 2.5 we studied convergence of sequence of measurable functions. We 
now examine a somewhat different notion of convergence. The results form 
the starting point for the study of the central limit theorem of probability. 

We will need some properties of semicontinuous functions; proofs of all 
necessary results are given in Appendix 2. 

If Q is a metric space, the class of Borel sets of Q, denoted by # (Q), 1s 
defined as the o-field generated by the open subsets of Q. In our applications 
to probability, we will need only the case Q = R*. 
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2.8.1 Theorem. Let u, ui, u2, ... be finite measures on the Borel sets of a 
metric space Q. The following conditions are equivalent: 


(a) fo f dun > fo fdu for all bounded continuous f: Q —> R. 

(b) liminf,.0 fo f dun > Jo f du for all bounded lower semicontinuous 
f: Q> R. 

(b) limsup, _... Jo f dun < fa fdu for all bounded upper semicontinu- 
ous f: Q — R. 

(c) fo fdun— fo fd for all bounded f: (Q,.8(Q)) > (R, #(R)) 
such that f is continuous a.e. [u]. 

(d) liminf, 0 £,(A) => u(A) for every open set AC Q, and pu, (&2) 
> pw(Q2). 

(d) limsup, _.., Hn(A) < (A) for every closed set A C Q, and pu, (Q) 
> p(Q). 

(e) pt, (A) > u(A) for every A € P(Q) such that (dA) = 0 (3A denotes 
the boundary of A). 


Proof. (a) implies (b): If g < f and g is bounded continuous, 


liminf {| f du, = liminf / g dun = f gdu by (a). 
R — OO O R — OO O O 

But since f is lower semicontinuous (LSC), it is the limit of a sequence of 
continuous functions, and if |f| < M, all functions in the sequence can also 
be taken less than or equal to M in absolute value. Thus if we take the sup 
over g in the above equation, we obtain (b). 

(b) is equivalent to (b’): Note that f is LSC iff — f is upper semicontin- 
uous (USC). 

(b) implies (c): Let f be the lower envelope of f (the sup of all LSC func- 
tions g such that g < f) and f the upper envelope (the inf of all USC functions 
g such that g > f). Since f(x) = lim infy_,, f(y) and f (x) = lim sup,_., f(y), 
continuity of f at x implies f(x) = f(x) = f (x). Furthermore, f is LSC and 
f is USC. Thus if f is bounded and continuous a.e. [u], 


| fdu= | fdu <timins | fdun by (b) 
Q QO” Fi — OO 97 


< liminf f f dur, since f<f 
o J 


R — 0O 


< lim sup | f du < limsup | F dus since f < f 


R — OO H — OO 


< | Fau by (b’) 
Q 


= | f du, proving (Cc). 
Q 
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(c) implies (d): Clearly (c) implies (a), which in turn implies (b). If A is 
open, then 74 is LSC, so by (b), liminf,_..5 Un (A) = u(A). Now Ig = 1, so 
un (2) > u(2) by (c). 

(d) is equivalent to (d): Take complements. 

(d) implies (e): Let A? be the interior of A, A the closure of A. Then 


lim sup un (A) < lim sup u, (A) < u(A) by (d’) 


H — OO fl 


= (A) by hypothesis. 
Also, using (d), 
lim inf u, (A) > liminf u, (4°) 2 4(A°) = u(A). 


(e) implies (a): Let f be a bounded continuous function on Q. If |f| < M, 
let A= {c e R: w(f~'{c}) 4 0};A is countable since the f~'{c} are dis- 
joint and u is finite. Construct a partition of [-M, M], say -M = tọ < t 
<+ <t; = M,witht; €A,i=0,1,..., j (M may be increased if necessary). 
If B; = {x: ti < f(x) < tiyip i= 0, 1,..., 7 — 1, it follows from (e) that 


j71 j-} 
X tiun (Bi) > > tin (Bi). 
i= i=0 


As FUG, tits) is open, af~'[ti, tis) C ft, fi+1} and uf {ti fi+1} = 0 
as ti, ti4, € A.] Now 


[fdn f fau 


jel 
<| fF dun — Y tin Bi 


i=0 


j-l j-l 

+ Stitt, (Bi) _ X tiu(B;) 
= i-0 
j-l 

+ So anB)- | fdu 
i=O0 


The first term on the right may be written as 


j-l 
E | Fe) = odno) 
i=0 i 
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and this is bounded by max;(t;.; — 4;),(&2), which can be made arbitrar- 
ily small by choice of the partition since tu, (8&2) > u(Q) < co by (e). 
The third term on the right is bounded by max;(t;,; — t;)j(&2), which can 
also be made arbitrarily small. The second term approaches 0 as n > ov, 
proving (a). LI 


2.8.2 Comments. Another condition equivalent to those of 2.8.1 is that 
fa f dun > fo fdu for all bounded uniformly continuous f: Q —> R (see 
Problem 1). 

The convergence described in 2.8.1 is sometimes called weak or vague 
convergence of measures. We shall write H, —» u. 

If the measures u, and u are defined on .# (R), there are corresponding 
distribution functions F„ and F on R. We may relate convergence of measures 
to convergence of distribution functions. 


2.8.3 Definition. A continuity point of a distribution function F on R is a 
point x € R such that F is continuous at x, or oo (thus by convention, co 
and —oo are continuity points). 


2.8.4 Theorem. Let u, 4), u2,... be finite measures on (R), with cor- 
responding distribution functions F, F,, F2,.... The following are equiva- 
lent: 


(a) Un — u. 
(b) Fa(a,b]— F(a,b] at all continuity points a,b of F, where F(a, b] 
= F(b) — F(a), F(co) = lim, >œ F(x), F(—20) = lim, — -œ F(x). 


If all distribution functions are 0 at —oo, condition (b) is equivalent to the 
statement that F,,(x) — F(x) at all points x € R at which F is continuous, 
and F,,{0o) > F(co). 


Proof. (a) implies (b): If a and b are continuity points of F in R, then 
(a, b] is a Borel set whose boundary has jz-measure 0. By 2.8.1 (e), un(a, b] 
— (a, b], that is, F,,(a,b] > F(a, b]. If a = —oo, the argument is the same, 
and if b = œ, then (a, co) is a Borel set whose boundary has u-measure 0, 
and the proof proceeds as before. 

(b) implies (a): Let A be an open subset of R; write A as the disjoint union 
of open intervals /,,/2,.... Then 
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lim inf un (A) = liminf X ` un (Ii) 
H — OO H — OO k 
> ` lim inf 4, (lk) by Fatou’s lemma. 
k Fi — OO 


Let € > O be given. For each k, let J," be a right-semiclosed subinterval of 7; 
such that the endpoints of I’ are continuity points of F, and w(Z;,’) > wd) 
— ¢2-*: the I,’ can be chosen since F has only countably many discontinuities. 
Then 

lim inf un (x) = lim inf y, (Ik) — wy") by (b). 


liminf u, (A) > Duk’) = $ ul) — e= uA) -= e. 
k k 


Since € is arbitrary, we have Hn — > u by 2.8.1(d). L 


Condition (b) of 2.8.4 is sometimes called weak convergence of the se- 
quence {F,,} to F, written F, — + F, 


Problems 


1. (a) If F is a closed subset of the metric space Q, show that Ip is the 
limit of a decreasing sequence of uniformly continuous functions fy, 
with 0 < fa < 1 for all n. 


(b) Show that in 2.8.1, pn, ——> n iff fo fdu, > fa fdu for all 
bounded uniformly continuous f: Q— R. 


2. Show that in 2.8.1, Hn ——> u iff ua (A) > u(A) for all open sets A 
such that (dA) = 0. 


2.9 REFERENCES 

The presentation in Chapters 1 and 2 has been strongly influenced by sev- 
eral sources. The first systematic presentation of measure theory appeared in 
Halmos (1950). Halmos achieves slightly greater generality at the expense of 
technical complications by replacing o-fields by o-rings. (A o-ring is a class 
of sets closed under differences and countable unions.) However, o-fields 
will be completely adequate for our purposes. The first account of measure 
theory specifically oriented toward probability was given by Loéve (1955). 
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Several useful refinements were made by Royden (1963), Neveu (1965), and 
Rudin (1966). Neveu’s book emphasizes probability while Rudin’s book is 
particularly helpful as a preparation for work in harmonic analysis. 

For further properties of finitely additive set functions, and a development 
of integration theory for functions with values in a Banach space, see Dunford 
and Schwartz (1958). 

A more recent treatment of measure and integration theory is that of Folland 
(1984), who discusses applications to Fourier analysis and probability. A de- 
velopment of measure theory that emphasizes the connection with probability 
is given by Doob (1994). 


CHAPTER 


3 


INTRODUCTION TO FUNCTIONAL 
ANALYSIS 


3.1 INTRODUCTION 

An important part of analysis consists of the study of vector spaces endowed 
with an additional structure of some kind. In Chapter 2, for example, we stud- 
ied the vector space LP (Q, 7, u). If 1 < p < œ, the seminorm || ||, allowed 
us to talk about such notions as distance, convergence, and completeness. 

In this chapter, we look at various structures that can be defined on vector 
spaces. The most general concept, which we will not study in detail, is that of 
a topological vector space, which is a vector space endowed with a topology 
compatible with the algebraic operations, that is, the topology makes vector 
addition and scalar multiplication continuous. We will concentrate on two 
special cases, Banach and Hilbert spaces. In a Banach space there is a notion 
of length of a vector, and in a Hilbert space, length is in turn determined 
by a “dot product” of vectors. Hilbert spaces are a natural generalization of 
finite-dimensional Euclidean spaces. 

We now list the spaces we are going to study. The term “vector space” 
will always mean vector space over the complex field C; “real vector space” 
indicates that the scalar field is R; no other fields will be considered. 


3.1.1 Definitions. Let L be a vector space. A seminorm on L is a function 
|| || from L to the nonnegative reals satisfying 


||ax|| = |a| ||x]| for all acC, xEL, 


lx+ yl] < lx+ ly forall x, yet. 


The first property is called absolute homogeneity, the second subadditivity. 
Note that absolute homogeneity implies that ||O|] = 0. (We use the same symbol 
for the zero vector and the zero scalar.) If, in addition, ||x|| = O implies that 
x = 0, the seminorm is called a norm on L and L is said to be a normed linear 
space. 
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If || || is a seminorm on L, and d(x, y) = |x — yll, x, y E€ L,d is a pseu- 
dometric on L; a metric if || || 1s a norm. A Banach space is a complete 
normed linear space, that is, relative to the metric d induced by the norm, 
every Cauchy sequence converges. 

An inner product on L is a function from L x L to C, denoted by (x, y) 
— (x, y), satisfying 


(ax + by, z) = a(x, Z) + by, 2) for all abe, x, y,zeEl, 
(x,y)=(y,x) forall x, yeEL 
(x, x) > 0 for all xEL, 


(x,x) =O if and only if x=0 


(the over-bar indicates complex conjugation). A vector space endowed with 
an inner product is called an inner product space or pre-Hilbert space. If L 
is an inner product space, ||x|] = ((x,x))'/* defines a norm on L; this is a 
consequence of the Cauchy -Schwarz inequality, to be proved in Section 3.2. 
If, with this norm, L is complete, L is said to be a Hilbert space. Thus a Hilbert 
space is a Banach space whose norm is determined by an inner product. 

Finally, a topological vector space is a vector space L with a topology 
such that addition and scalar multiplication are continuous, in other words, 
the mappings 


(x,y) > x+y of LxL into L 


and 
(a,x) —> ax of CxL into L 


are continuous, with the product topology on L x L and C x L. 
A Banach space is a topological vector space with the topology induced by 
the metric d(x, y) = ||x — y||. For if x, —> x and y, — y, then 


Xn + Yn — & + YIS [Xn — XI] + Ilyn — yll > 9; 
if a, — a and x, — x, then 


||anXn — ax|| < |la,xX, — a,X|| + ||anx — ax] 


= |ar| |x — x|| + la, — al ||x|| — 0. 


The above definitions remain unchanged if L is a real vector space, except 
of course that C is replaced by R. Also, we may drop the complex conju- 
gate in the symmetry requirement for inner product and simply wnite (x, y) 
= (y, x) for all x, y E L. 
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3.1.2 Examples. (a) If (Q,.¥, 2) is a measure space and 1 < p < œ, || ||, 
is a Seminorm on the vector space L? (Q, Z, u). If we pass to equivalence 
classes by identifying functions that agree a.e. [u], we obtain L? (Q, F, u), 
a Banach space (see 2.4). When p = 2, the norm || ||, is determined by an 
inner product 


(fg) = J fgdu. 


Hence L?(Q,.Z, u) is a Hilbert space. 

If Y consists of all subsets of Q and u is counting measure, then f = g 
a.e. [u] implies f = g. Thus it is not necessary to pass to equivalence classes; 
LP (Q, Z, u) is a Banach space, denoted for simplicity by /?(Q). 

By 2.4.12, if 1 < p < œ, then /?(Q) consists of all functions f = (f(a), 
a € Q) from Q to C such that f(œæ) = 0 for all but countably many a, and 
IFS = dua lf (@)|? < co. When p = 2, the norm on I*(Q) is induced by the 
inner product 


When p = œ, the situation is slightly different. The space [°° (Q) is the 
collection of all bounded complex-valued functions on Q, with the sup norm 


FI] = sup{| FG): x € 2}. 


Similarly, if Q is a metric space (or more generally, a topological space) 
and L is the class of all bounded continuous complex-valued functions on Q, 
then L is a Banach space under the sup norm, for we may verify directly that 
the sup norm is actually a norm, or equally well we may use the fact that 
L C I(Q). Thus we need only check completeness, and this follows because 
a uniform limit of continuous functions is continuous. 

(b) Let c be the set of all convergent sequences of complex numbers, and 
put the sup norm on c; if f = {a,,n > 1} € c, then 


IfI = sup{la,|: n = 1, 2, ...}. 


Again, to show that c is a Banach space we need only establish completeness. 
Let {fa} be a Cauchy sequence in c; if fa = {ank, k > 1}, then limg_,og ank 
exists from each n since f, € c, and by = lim, 0 ank exists, uniformly in <, 
since |ank — amk| < || fn — fm|| > 0 as n, m —> œ. By the standard double 
limit theorem, 


lim lim a,,; = lim lim apk. 
R>CO koo k—300 n->0O 
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In particular, limo bk exists, so if f = {h,k > 1}, then f ec. But 
fn — f I| = sup, lank — bk| > O as n > œ since apk > bg uniformly in k. 
This proves completeness. 

(c) (If you are unfamiliar with general topology, you may skip this example.) 

Let L be the collection of all complex valued functions on S, where S is 
an arbitrary set. Put the topology of pointwise convergence on L, so that a 
sequence or net {f,,} of functions in L converges to the function f € L if and 
only if f,(x) — f(x) for each x € $. With this topology, L is a topological 
vector space. To show that addition is continuous, observe that if fa > f 
and g, — g pointwise, then f, + 8a —> f + g pointwise. Similarly if a, € C, 
n=1,2,...,a, > a, and fa — f pointwise, then a, fn —> af pointwise. 


3.2 Basic PROPERTIES OF HILBERT SPACES 

Hilbert spaces are a natural generalization of finite-dimensional Euclidean 
spaces in the sense that many of the familiar geometric results in R” carry 
over. First recall the definition of the inner product (or “dot product”) on R”: If 
x= (x},...,X,) and y=(y,..., Yn) then (x, y} = ja Xj (This becomes 
iat x;y; in the space C” of all n-tuples of complex numbers.) The length of 
a vector in R” is given by ||x|] = ((x, x))'/* = (© 5-1 x5)'/*, and the distance 
between two points of R” is d(x, y) = ||x — yl]. In order to show that d is a 
metric, the triangle inequality must be established; this in turn follows from the 
Cauchy—Schwarz inequality |(x, y)| < ||x|| || yl]. In fact the Cauchy -Schwarz 
inequality holds in any inner product space, as we now prove: 


3.2.1 Cauchy—Schwarz Inequality. If L is an inner product space, and ||x| 
= ({x,x))'/*,x € L, then 


(xs y} < [ell ly] foral =x, ye L. 
Equality holds iff x and y are linearly dependent. 


Proof. For anya €C, 


0 < (x +ay, x + ay) = (x + ay, x) + (x + ay, ay) 
= (x, x) + aly, x) + alx, y) + lal*(y, Y). 


Set a = —(x, y)/(y, y) Gf (y, y} = 0, then y = 0 and the result is trivial). Since 
(y, x) = (x, y}, we have 


ix, yt a yd I? 


QO < (x,x)-—2 , 
(y, y) (y, Y) 


proving the inequality. 
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As (x + ay, x + ay) = 0 iff x + ay = 0, equality holds iff x and y are lin- 
early dependent. L 


3.2.2 Corollary. If L is an inner product space and ||x|| = ((x, x))'/7, x € L, 
then || || is a norm on L. 


Proof. It is immediate that ||x|| > 0, ||ax|| = |a| ||x||, and ||x|| = 0 iff x = 0. 
Now 


x + yll? = (x + y, x + y) = Ixl? + Iyl + Ge y) + (y, x) 
= ||x||? + Iyl]? + 2 Re (x, y) 
< jlx? + yl]? + 2x yll by 3.2.1. 


Therefore ||x + yl? < (xl + llyll)°. £ 


3.2.3 Corollary. An inner product is (jointly) continuous in both variables, 
that is, x, — xX, Yn — y implies (x,, yn) > (x, y). 


PROOF. 


Xn» Vn) — (x, Y = Xn, Yn — Y) + (Xn — X,Y) 
< Jall yn — yi) + llXa — xl Iyl by 3.2.1. 


However, by subadditivity of the norm, ||x,|] < ||x, — x|| + [|x|] and ||x| 
< |x — x,|| + ||x,||, and, therefore, 


| leall — [lel] | < [xn — x|| > 0; 
hence ||x,,|| — ||x||. It follows that (xn, Yn) > (x, y}. U 
The computation of 3.2.2 establishes the following result, which says ge- 
ometrically that the sum of the squares of the lengths of the diagonals of a 
parallelogram is twice the sum of the squares of the lengths of the sides. 
3.2.4 Parallelogram Law. In an inner product space, 
x + yi? + lx — yi? = 2x1 + liyi’). 


PROOF. 
Ix + yl]? = lix? + [ly]? + 2 Ret, y), 
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and 
lx — yl? = IIx]? + lly? — 2 Re(x, y}. 0 


Now suppose that x;,..., x, are mutually perpendicular unit vectors in R*, 
k >n. If x is an arbitrary vector in R*, we try to approximate x by a linear 
combination yo j=) ajx. The reader may recall that $°_, ajx; will be closest 
to x in the sense of Euclidean distance when a; = (x, x;). This result holds in 
an arbitrary inner product space. 


3.2.5 Definition. Two elements x and y in an inner product space L are 
said to be orthogonal or perpendicular uf (x, y} = 0. If B C L, B is said to 
be orthogonal iff (x, y} = 0 for all x, y € B such that x Æ y; B is orthonormal 
iff it is orthogonal and ||x|| = 1 for all x € B. 

The computation of 3.2.2 shows that if x), x%.,...,x, are orthogonal, the 
Pythagorean relation holds: |X}; «||? = So, lA. 


3.2.6 Theorem. lf {x;,...,x,} is an orthonormal set in the inner product 
space L, and x E L, 


It 
x— y aj;x;\| is minimized when aj; =(x,xj), Juol,....n. 
j=l 


PROOF. 
2 


n n n 
x— Y ajšj = (x- X ajxj,x- San) 
j=l j=l k=l 
n n 
= |x? — $ ax, xx) — X a(x, x) 
k=] j=l 


n n 
+ y AiXjs ) AkXk }. 
j=l k=} 


The last term on the right is ial la; |Z since the x ; are orthonormal. Further- 
more, —@;(x,xj;) — aj(x;, x) + la; = —|(x, x;)|? + Ja; — (x, x;)|*. Thus 


2 


O<\\x- > axi =I? — Sola x) P+ So a tex), 0) 
j=l j=l J=|l 


so that we can do no better than to take a; = (x, x;}. U 
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The above computation establishes the following important inequality. 


3.2.7 Bessel’s Inequality. If B is an arbitrary orthonormal subset of the inner 
product space L and x is an element of L, then 


Ixl? = X [x yy. 


yeB 


In other words, (x, y} = 0 for all but countably many y € B, say y = x), %,..., 
and 


2 2 
Ixl? = Six, x). 


Equality holds iff XZ; (x, xj)xj > x as n > œœ. 


Proof. If x;,...,x, E B, set a; = (x,x;) in Eq. (1) of 3.2.6 to obtain 
lx — SO (x, xy) ayll? = Ileal? — OM x) |? = 0. 


We now consider another basic geometric idea, that of projection. If M is a 
subspace of R” and x is any vector in R”, x can be resolved into a component 
in M and a component perpendicular to M. In other words, x = y + z where 
y €M and z is orthogonal to every vector in M. Before generalizing to an 
arbitrary space, we indicate some terminology. 


3.2.8 Definitions. A subspace or linear manifold of a vector space L is a 
subset M of L that is also a vector space; that is, M is closed under addition 
and scalar multiplication. The subset M is said to be a closed subspace of L 
if M is a subspace and is also a closed set in the metric of L. 

A subset M of the vector space L is said to be convex iff for all x, y € M, 
we have ax + (1 — a)y e M for all real a e [0, 1]. 

A key fact is required: If M is a closed convex subset of the Hilbert space 
H and x is an arbitrary point of H, there is a unique point of M closest to x. 


3.2.9 Theorem. Let M be a nonempty closed convex subset of the Hilbert 
space H. If x e H, there is a unique element yo € M such that 


|x — yol] = inf{||x — yl]: y € M}. 


Proof. Let d = inf{||x — yl]: y € M}, and pick points yı, y.,... EM with 
|x — yn|| > d as n — œ; we show that {y,,} is a Cauchy sequence. 

Since |u + vl]? + lu — vl]? = 2] lull? + 2\vl|* for all u, v € H by the paral- 
lelogram law 3.2.4, we may set u = y, — x, U = Ym — x to obtain 


lyn + Ym — XN + NYa — Yml? = 2h yn — xl]? + lyn — xl]? 
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or 
yn — Yall? = lyn — XN + lym — xN? — 4114 On + Ym) — HI. 


Since 5 (Vn + Ym) E M by convexity, |$ On + Ym) — x||* > d7, and it follows 
that || y, — Yn|| > 0 as n, m > oo. 

By completeness of H, y, approaches a limit yo, hence ||x — y,| 
— ||x — yo]. But then ||x — yo|| = d, and yo E M since M is closed; this fin- 
ishes the existence part of the proof. 

To prove uniqueness, let yo, zo E M, with ||x — yo|| = ||x — Zo|| = d. In the 
parallelogram law, take u = yo — x, V = Zo — x, to obtain 


lyo + zo — 2x1? + Ilyo — zoll? = 2llyo — xl? + 2llzo — x|? = 4a’. 


But || yo + Zo — 2x||7 = 41|5 (yo + zo) — xI|? = 4d’; hence || yo — zoll = 0, so 
yo = Zo. L 


If M is a closed subspace of H, the element yọ found in 3.2.9 is called the 
projection of x on M. The following result helps to justify this terminology. 


3.2.10 Theorem. Let M be a closed subspace of the Hilbert space H, and 
yo an element of M. Then 


lx — yol = inf{|lx— yl: ye M} iff x-y LM, 
that is, (x — yo, y} = O for all y € M. 
Proof. Assume x — yo L M. If y e M, then 
Ix — yll? = Ix — yo — (y — yo) II 
= ||x — yoll” + lly — yol? — 2 Re(x — yo, y— yo) 


= |Ix— yol +y — yol? since y- yọ EM 
> ||x — yoll’. 
Therefore, |x — yo|| = inf{||x — yl]: y E€ M}. 
Conversely, assume ||x — yol = inf{||x — yl|: y e M}. Let ye M and let c 


be an arbitrary complex number. Then yo + cy € M since M is a subspace, 
hence ||x — yo — cyl] = |lx — yoll. But 


lx — yo — cyll? = lx — yol? + lel*Ilyll? — 2 Re(x — yo, ey); 
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hence 
jel llyl? — 2 Re(x — yo, cy) > 0. 


Take c=b(x — yo, y),b real. Then (x — yo, cy) =b|(x — yo, y)|*. Thus 
I(x — yo, y)|7[b7|l yl]? — 2b] > 0. But the expression in square brackets is neg- 
ative if b is positive and sufficiently close to 0; hence (x — yo, y} = 0. LI 


We may give still another way of characterizing the projection of x on M. 


3.2.11 Projection Theorem. Let M be a closed subspace of the Hilbert space 
H. If x € H, then x has a unique representation x = y+ z where y € M and 
z L M. Furthermore, y is the projection of x on M. 


Proof. Let yo be the projection of x on M, and take y= yọ, z= x — yo- 
By 3.2.10, z L M, proving the existence of the desired representation. To 
prove uniqueness, let x = y +z = y +z’ where y, y € M,z,z’ LM. Then 
y— y € M since M is a subspace, and y— y LM since y— y =2' —z. 
Thus y— y’ is orthogonal to itself, hence y = y’. But then z = z’, proving 
uniqueness. [I 


If M is any subset of H, the set M+ = {x € H: x L M} is a closed subspace 
by definition of the inner product and 3.2.3. If M is a closed subspace, M- 
is called the orthogonal complement of H, and the projection theorem is 
expressed by saying that H is the orthogonal direct sum of M and M+, written 
H=M@M-. 

In R”, it is possible to construct an orthonormal basis, that is, a set 
{x;,...,X,} of n mutually perpendicular unit vectors. Any vector x in R” 
may then be represented as x = X`; (x, x;)x;, so that (x, x;) is the component 
of x in the direction of x;. We are now able to generalize this idea to an 
arbitrary Hilbert space. The following terminology will be used. 


3.2.12 Definitions. If B is a subset of the normed linear space (or more 
generally, the topological vector space) L, the space spanned by B, denoted 
by S(B), is the smallest closed subspace of L containing all elements of B. 
If L(B) is the linear manifold generated by B, that is, L(B) consists of 
all elements j; ajx;,a; E C, x € B,i=1,...,n,n =1,2,..., then S(B) 
= L(B). 

If B is a subset of the Hilbert space H, B is said to be an orthonormal 
basis for H iff B is a maximal orthonormal subset of H, in other words, B 
is not a proper subset of any other orthonormal subset of H. An orthonormal 
set B C H is maximal iff S(B) = H, and there are several other conditions 
equivalent to this, as we now prove. 
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3.2.13 Theorem. Let B = {x,,a € I} be an orthonormal subset of the Hilbert 
space H. The following conditions are equivalent: 


(a) Bis an orthonormal basis. 

(b) B is a “complete orthonormal set,” that is, the only x € H such that 
xlBisx=0. 

(c) B spans H, that is, S(B) = H. 

(d) Forallx € H, x = 5°, (x, Xa}Xaæ. (Let us explain this notation. By 3.2.7, 
(x, Xx} = O for all but countably many x,, say for x;, x2, ...; the assertion is 
that $; (x, x;)x; > x, and this holds regardless of the order in which the 
x; are listed.) 

(e) For all x, y € H, (x, y) = 50, (%, Xa) (Xæ, Y}- 

(f) For all x €H, |x|? = >, | (x, xe) |?. 


Condition (f) [and sometimes (e) as well] is referred to as the Parseval 
relation. 


Proof. (a) implies (b): If x L B,x 40, let y= x/||x||. Then BU {y} is an 
orthonormal set, contradicting the maximality of B. 

(b) implies (c). If x €e H, write x = y+z where y € S(B) and z L S(B) 
(see 3.2.11). By (b), z = 0; hence x € S(B), 

(c) implies (d): Since S(B) = L(B), given x € H and e > 0 there is a finite 
set F C I and complex numbers ax, a € F, such that 


X — ; AdaXea 


ack 


<E, 


By 3.2.6, if G is any finite subset of 7 such that F C G, 


x — S (X, Xa Xall < x— 5 do Xey where ag=O0 for «gF 
EG aceG 
— xX — Ae Xo < E. 
acer 


We may assume that (x, xx} Æ 0 for every a € F. 

Thus if x;, X2,...iS any ordering of the points x, € B for which (x, x.) Æ 0, 
|x — = (x, X;)x;|| < £ for sufficiently large n, as desired. 

(d) implies (e): This is immediate from 3.2.3. 

(e) implies (f): Set x = y in (e). 

(f) implies (a): Let C be an orthonormal set with B c C,B ÆC. If 
x€C,x ¢ B, we have ||x||* = $}, |{x, xx)|7 =0 since by orthonormality of 
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C, x is orthogonal to everything in B. This is a contradiction because ||x|| = 1 
for all xe C. O 


3.2.14 Corollary. Let B = {x,,a € I} be an orthonormal subset of H, not 
necessarily a basis. 


(a) B is an orthonormal basis for $(B). [Note that S(B) is a closed subspace 
of H, hence is itself a Hilbert space with the same inner product.] 
(b) Ifxe H and y is the projection of x on S(B), then 


y = Sx, Xa) Xa 


am 


[see 3.2.13(d) for the interpretation of the series]. 


Proof. (a) Note that the space spanned by B in S(B) is S(B) itself. 
(b) By part (a) and 3.2.13(d), y= `(Y, Xa)Xq. But x— y L S(B) by 
3.2.11, hence (x, xy) = (Y, Xx} for alla. L 


A standard application of Zorn’s lemma shows that every Hilbert space has 
an orthonormal basis; an additional argument shows that any two orthonormal 
bases have the same cardinality (see Problem 5). This fact may be used to 
classify all possible Hilbert spaces, as follows. 


3.2.15 Theorem. Let S be an arbitrary set, and let H be a Hilbert space 
with an orthonormal basis B having the same cardinality as S. Then there is 
an isometric isomorphism (a one-to-one-onto, linear, norm-preserving map) 
between H and /7(S). 


Proor. We may write B= {x,,a € S}. If xe H, 3.2.13(d) then gives 
x= Ñ (x, Xa)Xa, Where Y, |(x, xa}? = |x] < co by 3.2.13(f). The map 
x —> (x, X%),a@ E S) of H into 1*(S) is therefore norm-preserving; since it 
is also linear, it must be one-to-one. To show that the map is onto, consider 
any collection of complex numbers ax, a € S$, with $, [axl < oo. Say ay = 0 
except for a = a), a2,..., and let x = $., Ay Xaj- [The series converges to 
an element of H because of the following fact, which occurs often enough 
to be stated separately: If {y,, y2,...} is an orthonormal subset of H, the se- 
ries >|, Cj; y; converges to some element of H iff X`, |c j|7 < co. To see this, 
observe that || X =n cyl = =n le; lly? = mn lc;|7; thus the partial 
sums form a Cauchy sequence iff X>, |cj|* < c.] 

Since the x, are orthonormal, it follows that (x, xx) = ay for all æ, so that 
x maps onto (ax, œ € $). O 
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Theorem 3.2.15 is not as useful as it looks. For example, when working in 
L7[0, 1], we usually take advantage of what we know about [0,1]. 

We may also characterize Hilbert spaces that are separable, that is, have a 
countable dense set. 


3.2.16 Theorem. A Hilbert space H is separable iff it has a countable or- 
thonormal basis. If the orthonormal basis has n elements, H is isometrically 
isomorphic to C”; if the orthonormal basis is infinite, H is isometrically iso- 
morphic to I*, that is, 17(S) with S = {1, 2,...}. 


Proor. Let B be an orthonormal basis for H. Now ||x — yl? = Ixl? + || yll? 
= 2 for all x, y € B, x Æ y, hence the balls A, = {y: || y — xl < ih x € B, are 
disjoint. If D is dense in H, D must contain a point in each A,, so that if B is 
uncountable, D must be also, and therefore H cannot be separable. 

Now assume B is a countable set {x,;, x2,...}. If U is a nonempty open 
subset of H[= S(B) = L(B)], U contains an element of the form int a;X; 
with the a; € C; in fact the a; may be assumed to be rational, in other words, 
to have rational numbers as real and imaginary parts. Thus 


ft 
X ajx;: n=1,2,..., the a; rational 


is a countable dense set, so that H is separable. The remaining statements of 
the theorem follow from 3.2.15. LU 


A linear norm-preserving map from one Hilbert space to another auto- 
matically preserves inner products; this is a consequence of the following 
proposition. 


3.2.17 Polarization Identity. In any inner product space, 
A(x, y) = [lx + yl’ — Ile — yl? + ill + iyl? — illx — ill? 
PROOF. 


lx + yll? = Ixl? + Iyl? + 2 Re(x, y) 
lx — yll? = Ixl? + Iyl? — 2 Ret, y) 
lx + yl? = Ixl? + iyl? + 2 Re(x, iy) 
lx — iyl? = Ixl? + Iyl? — 2 Rex, iy) 


But Re(x, iy) = Re[—i(x, y)] = Im({x, y}, and the result follows. DO 
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Problems 


1. In the Hilbert space /7(S), show that the elements ex, a € S, form an 
orthonormal basis, where 


sofo sta 


l, S= a, 


2. (a) If A is an arbitrary subset of the Hilbert space H, show that 


A+- = S(A). 
(b) If M is a linear manifold of H, show that M is dense in H iff 
M+ = {0}. 
3. Let x,,...,X, be elements of a Hilbert space. Show that the x; are 


linearly dependent iff the Gramian (the determinant of the inner products 
(xj, x; d, J=1,...,m) 1s Q. 

4. (Gram-Schmidt process) Let B = {x,,x.,...} be a countable linearly 
independent subset of the Hilbert space H. Define e; = x,/|!x,|]; having 
chosen orthonormal elements €;,..., €n, let Y„+ı be the projection of 
Xn+1 on the space spanned by e),..., €n: 


n 


Yn+1 = S nti, €j})€i. 


i=] 


Define 
Matt — Yat 
nti = | 
[Xr+ — Yn +l 
(a) Show that Lf{e,,...,e,} = L{x,,...,x,} for all n, hence x+; Æ 


Yn, and the process is well defined. 
(b) Show that the e, form an orthonormal basis for S(B). 


Comments. Consider the space H = L*(—1, 1); if we take x, (t) = t”, 


n=0,1,..., the Gram-Schmidt process yields the Legendre polyno- 
mials e,(t) = a,d"[(t* — 1)"]/dt”, where a, is chosen so that |le,|] = 1. 
Similarly, if in L?(—o00, 00) we take x,(t) = t"e""/?, n =0,1,..., we 


obtain the Hermite polynomials e,(t) = a,(—1)"e" d” (e )/dt”. 
5. (a) If you are familiar with Zorn’s Lemma, show that every Hilbert 
space has an orthonormal basis. 


(b) If you are familiar with cardinal arithmetic, show that any two 
orthonormal bases have the same cardinality. 


6. Let U be an open subset of the complex plane, and let H(U) be the 
collection of all functions f analytic on U such that 
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IfI = [fite +i? dxay < o. 
Bi 


If we define 


fe) = J J Axta + iy)drdy, f, g € HU), 
‘ei 


H(U) becomes an inner product space. 


(a) 


7. (a) 


(b) 


If K is a compact subset of U and f € H(U), show that 
sup{| f(z)|: z € K} < II fll//7 do 


where dọ is the Euclidean distance from K to the complement of 
U. Therefore convergence in H(U) implies uniform convergence 
on compact subsets of U. (If z e K, the Cauchy integral formula 
yields 


20 
f(z) = Qn) J f(z + re” )d@, r< do. 
0 


Integrate this equation with respect to 7,0 < r < d < do. Note also 
that if U is the entire plane, we may take dp = ov, and it follows 
that H(C) = {0}.) 

Show that H(U) is complete, and hence is a Hilbert space. 


If f is analytic on the unit disk D = {z: |z| < 1} with Taylor ex- 
pansion f(z) = $`? o4z”, show that 


cup | Ifoda = sla É 
0<r<1 2T Jo mar 


It follows that if H? is the collection of all functions f analytic on 
D such that 


1 p% 
N’ (f) = sup z | f (re®)|? dO < 00, 
<r<] 


then H*, with norm N, is a pre-Hilbert space. 
If f € H’, show that 


J f@ + iy)? dxdy < aN? (f); 


D 


hence H? C H(D) and convergence in H* implies convergence in 
H(D). 
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(c) 


(d) 


(e) 


(£) 


If faz) =z”, n =0,1,..., show that fa —> 0 in H(D) but not in 
H’. 

Show that H? is complete, and hence is a Hilbert space. [By (a), 
H? is isometrically isomorphic to a subspace of [*.] 

If e,(z) =z", n = 0, 1,..., show that the e, form an orthonormal 
basis for H?. 

If e,(z) = [(2n + 2)/27]'/7z", n = 0, 1,..., show that the e,, form 
an orthonormal basis for H(D). 


8. Let M be a closed convex subset of the Hilbert space H, and yo an 
element of M. If x € H, show that 


iff 


*O. (a) 


(b) 


(c) 


10. (a) 


(b) 


lx — yoll = inf{llx — yll: y E€ M} 


Re(x — yo, Y— yo) < 0 for all yeM. 


If g is a continuous complex-valued function on [0, 277] with g(0) 
= g(27), use the Stone—Weierstrass theorem to show that g 
can be uniformly approximated by trigonometric polynomials 
Sr, cee. Conclude that the trigonometric polynomials are 


dense in L7[0, 27]. 
If f € L’[0, 27], show that the Fourier series a a,e, an 


= (1/27) J7 fine dt, converges to f in L’, that is, 


20 n 
| PO- ae 


k=—n 


2 
dt > 0 as n — co. 


Show that (e Nn, n = 0, +1, +2,...} yields an orthonormal ba- 
sis for L*[0, 27]. 

Give an example to show that if M is a nonempty, closed, but not 
convex subset of a Hilbert space H, there need not be an element 
of minimum norm in M. Thus the convexity hypothesis cannot 
be dropped from Theorem 3.2.9, even if we restrict ourselves to 
existence and forget about uniqueness. 


Show that convexity is not necessary in the existence part of 3.2.9 
if H is finite-dimensional. 


3.3 LINEAR OPERATORS ON NORMED LINEAR SPACES 
The idea of a linear transformation from one Euclidean space to another is 
familiar. If A is a linear map from R” to R”, then A is completely specified 
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by giving its values on a basis €;,..., €n: AQ], ciei) = X; c:A(ei); fur- 
thermore, A is always continuous. If elements of R” and R” are represented 
by column vectors, A is represented by an m x n matrix. If n = m, so that 
A is a linear transformation on R”, A is one-to-one iff it is onto, and if A`! 
exists, it is always continuous (as well as linear). 

Linear transformations on infinite-dimensional spaces have many features 
not found on the finite-dimensional case, as we shall see. 

In this section, we study mappings A from one normed linear space L to 
another such space M. The mapping A will be a linear operator, that is, 
A(ax + by) = aA(x) + bA(y) for all x, y € L, a,b € C. We use the symbol || || 
for the norm on both spaces; no confusion should result. Linear operators can 
of course be defined on arbitrary vector spaces, but in this section, it is always 
understood that the domain and range are normed. 

Linearity does not imply continuity; to study this idea, we introduce a new 
concept. 


3.3.1 Definitions and Comments. If A is a linear operator, the norm of A 
is defined by: 


(a) ||A|| = sup{||Ax]|: x € L, ||x|| < 1}. We may express ||A|| in two other 
ways. 


(b) |All = sup{||Ax||: x € L, |x|] = 1}. 
(c) IAI] = sup{||Ax||/Ilx||: x € L, x Æ 0}. 


To see this, note that (b) < (a) is clear; if x Æ 0, then ||Ax||/lixl| = IA (&œ/llx|DIl, 
and x/||x|| has norm 1; hence (c) < (b). Finally if ||x|| < 1, x 40, then ||Ax]| 
< ||Ax||/Ilxll, so (a) < (c). 

It follows from (c) that ||Ax|| < |A|] |x|], and in fact ||A]|| is the smallest 
number k such that ||Ax|| < &||x|| for all x € L. 

The linear operator A is said to be bounded iff ||A|| < co. Boundedness is 
often easy to check, a very fortunate circumstance because we can show that 
boundedness is equivalent to continuity. 


3.3.2 Theorem. A linear operator A is continuous iff it is bounded. 


ProorF. If A is bounded and {x,} is a sequence in L converging to 0, then 
lAx, || < lAl] |x, || — 0. (We use here the fact that the mapping x —> ||x|| of 
L into the nonnegative reals is continuous; this follows because | ||x|| — |l-yll | 
< ||x — yll.) Thus A is continuous at 0, and therefore, by linearity, is continuous 
everywhere. On the other hand, if A is unbounded, we can find elements 
Xn E€ L with ||x,|| < 1 and ||Ax,,|| — co. Let yn = x,/||Ax,||; then yn — 0, but 
\|Ay, || = 1 for all n, hence Ay, does not converge to 0. Consequently, A is 
discontinuous, L] 
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We are going to show that the set of all bounded linear operators from L 
to M is itself a normed linear space, but first we consider some examples. 


3.3.3 Examples. (a) Let L = C[a, b], the set of all continuous complex- 
valued functions on the closed bounded interval [a, b] of reals. Put the sup 
norm on L: 


Ixl = sup{Ix(l: a <t < b}, x € C[a, b]. 


With this norm, L is a Banach space [see 3.1.2(a)]. Let K = K (s, t) be a con- 
tinuous complex-valued function on [a, b] x [a, b], and define a linear operator 
on L by 


b 
(Ax)(s) = J K(s, t)x(t) dt, axs<b. 


(By the dominated convergence theorem, Ax actually belongs to L.) We show 
that A is bounded: 


||Ax|| = sup 


a<s<b 


b 
J K (s, t)x(t) dt 


b 
< sup |x(£)| sup | IK (s, t)| dt 


a<t<b a<s<b 


b 
= |x| max J IK (s, D| dt 


(note that f ” |K (s, t)| dt is a continuous function of s, by the dominated con- 
vergence theorem). Thus 


b 
|All] < max f IK(s, t)| dt < oo. 


In fact ||Al] = maxa<;<, [7 |K(s, t)| dt (see Problem 3). 

(b) Let L =71?(S), where S is the set of all integers and 1 < p < œ. 
Define a linear operator T on L by (Tf)y = fnii1,n € S; T is called the 
two-sided shift or the bilateral shift. It follows from the definition that T is one- 
to-one onto, and (T~! f) = fy_1,n € S. Also, ||7f|| = {| f {| for all f € L: 
hence ||7—! f || = || f|| for all f € L, so that T is an isometric isomorphism of 
L with itself. (In particular, ||7'|| = ITT! = 1.) 

If we replace § by the positive integers and define T as above, the resulting 
operator is called the one-sided or unilateral shift. The one-sided shift is onto 
with norm 1, but is not one-to-one; T(f;, f2,...) = (f2, fs, ...). 
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The shift operators we have defined are shifts to the left. We may also define 
shifts to the right; in the two-sided case we take (Af), = fa-1, n E S (so that 
A = T~'). In the one-sided case, we set (Af), = fn_1,n > 2; (Af); = 0; thus 
A(fi, f2,...) = (0, fi, fo, ...). The operator A is one-to-one but not onto; 
A(L) is the closed subspace of L consisting of those sequences whose first 
coordinate is 0. 

(c) Assume that the Banach space L is the direct sum of the two closed 
subspaces M and N, in other words each x € L can be represented in a unique 
way as y+z for some y €M, z€ N. (We have already encountered this 
situation with L a Hilbert space, M a closed subspace, N = M+.) We define 
a linear operator P on L by 


Px = y. 


P is called a projection; specifically, P is the projection of L on M and it has 
the following properties: 


(1) P is idempotent, that is, P? = P, where P? is the composition of P 
with itself. 
(2) P is continuous. 


Property (1) follows from the definition of P; property (2) will be proved later, 
as a consequence of the closed graph theorem (see after 3.4.16). 
Conversely, let P be a continuous idempotent linear operator on L. Define 


M={xeL: Px =x}, N={xeL: Px =O}. 


Then we can prove that M and N are closed subspaces, L is the direct sum of 
M and N, and P is the projection of L on M. 

By continuity of P, M and N are closed subspaces. If x € L, then 
x= Px+ (I —P)x = y+z where ye M, zEN. Since MAN = {0}, L is 
the direct sum of M and N. Furthermore, Px = y by definition of y, so that 
P is the projection of L on M. 

If f is a linear operator from a vector space L to the scalar field, f is called 
a linear functional. (The norm of a scalar b is taken as |b|.) Considerable 
insight is gained about normed linear spaces by studying ways of representing 
continuous linear functionals on such spaces. We give some examples. 


3.3.4 Representations of Continuous Linear Functionals. (a) Let f bea 
continuous linear functional on the Hilbert space H. We show that there is a 
unique element y € H such that 


f(x) = (x, y) for all x EH. 


This is one of several results called the Riesz representation theorem. 


3.3 LINEAR OPERATORS ON NORMED LINEAR SPACES 145 


If the desired y exists, it must be unique, for if (x, y) = (x, z) for all x, then 
y — zis orthogonal to everything in H (including itself), so y = z. 

To prove existence, let N be the null space of f, that is, N = {x € 
H: f(x) = 0}. If N+ = {0}, then N =H by the projection theorem; hence 
f =0, and we may take y = 0. Thus assume we have an element u € N+ 
with u #0. Then u ¢ N, and if we define z = u/f(u), we have z € N+ and 
f(z) = 1. 

If x € H and f(x) =a, then 


x = (x — az) + az, with x—azEN, azLnN. 
If y= 2/|z||*, then 


(x, y} = (x — az, y) + a(z, y) 
= a(z, y) since y LN 
=a = f(x) 


as desired. 

The above argument shows that if f is not identically 0, then N+ is 
one-dimensional. For if x € N+ and f(x) =a, then x — az € NNN“, hence 
x = az. Therefore N+ = {az: a e C}. 

Notice also that if || f|| is the norm of f, considered as a linear operator, 
then 


IF ll = Iyi. 


For | f(x) = |x, y) < lll lyi for all x, hence [fll < llyll; but |fO)| 
= ky, y) = yil iyi, so WF Tl = yh. 

Now consider the space H* of all continuous linear functionals on H; H* 
is a vector space under the usual operations of addition and scalar multipli- 
cation. If to each f € H* we associate the element of H given by the Riesz 
representation theorem, we obtain a map yr: H* — H that is one-to-one onto, 
norm-preserving, and conjugate linear; that is, 


paf + bg) =aw(f) + bw(g). f,g€H*, a bec. 


[Note that if f(x) = (x, y) for all x, then af(x) = (x, @y).] Such a map is called 
a conjugate isometry. 

(b) Let f be a continuous linear functional on P (= 1?(S), where S is 
the set of positive integers), 1 < p < co. We show that if q is defined by 
(1/p) + (1/qg) = 1, there is a unique element y = (y1, Y2, -..) E€ 7% such that 


OO 
fx) = So xy for all xel. 
k=] 
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Furthermore, 


00 1/q 
Ifill = llyll = (S) 
k=} 


To prove this, let e, be the sequence in /? defined by e,(j/) =0, J Æn: 
en(n)=1. If xel’, then |x — Yz erll? = Oey lel? —> 0; hence 
x = 57%, x,e,, where the series converges in /”. By continuity of f, 


f(x)= > oxy, where =-_y = flex). (1) 
k= 1 


Now write y, in polar form, that is, y, = re, rg > 0. Let 


Zn = (Te, a rile n, 0,0,...); 
by (1), 
fen) =Y rp e re = YN yilt. (2) 
k=] k=} 
But 
n l/p 
= If 5 at=) 
k=1 
n lip 
= |f 5 vl . 
k=l 
By Eq. (2), 


n L/q 
5 ml < II fI; 
k=} 


hence y € l? and ||y|| < || f ||. To prove that || f || < ||_y||, observe that Eq. (1) is 
of the form f(x) = fe xy du where Q is the positive integers and x is counting 
measure. By Holder’s inequality, | f(x)| < ||x|| |] yl], so that || || < Iyl. 
Finally, we prove uniqueness. If y,zel? and f(x) = or, my 
= So, eZ for all x € 1?, then g(x) = X3; xe (ye — zz) = 0 for all x € IP. 
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The above argument with f replaced by g shows that ||g|| = || y — zil; hence 
lly — z|| = 0, and thus y = z. 

(c) Let f be a continuous linear functional on /'. We show that there is a 
unique element y € /° such that 


OO 
fx) = So eM for all xel. 
k=] 


Furthermore, || f| = ll yll = sup, lyzl. 
The argument of (b) may be repeated up to Eq. (1). In this case, however, 
if yg = rge™*, k = 1,2,..., we take 


Za = (0,..., 0, etn Q,...), with eth in position n. 


Thus by (1), | | 
Fn) = erae" = yal. 


But 
Fadl < A izal = I. 


Therefore |y,| < || f I| for all n, so that y € 7° and ||yl| < || fll. But by (1), 


OO 


lf) < (sup! S = [xl = Ilyll lIl; 


k=1 


hence || f|| < || y||. Uniqueness is proved as in (b). 

In 3.3.4(b) and (c), the map f > y of (1?)* to /4[g = co in 3.3.4(c)] is linear 
and norm-preserving, hence one-to-one. To show that it is onto, observe that 
any linear functional of the form f(x) = S77, XkYyk x E lP, y € 14, satisfies 
IfI < lly] by the analysis of 3.3.4(b) and (c). Therefore f is continuous, so 
that every y € l? is the image of some f e (/?)*. Since || f || = || yl], we have 
an isometric isomorphism of (/?)* and 12. 

If we replace yg by Y}, we obtain the result that there is a unique y € /? such 
that f(x) = S772, xx, for all x € IP. This makes the map f —> y a conjugate 
isometry rather than an isometric isomorphism. If p = 2, then g = 2 also, and 
thus we have another proof of the Riesz representation theorem for separable 
Hilbert spaces (see 3.2.16). In fact, essentially the same argument may be used 
in an arbitrary Hilbert space if the e, are replaced by an arbitrary orthonormal 
basis. (Other examples of representation of continuous linear functionals are 
given in Problems 10 and 11.) 

We now show that the set of bounded linear operators from one normed 
linear space to another can be made into a normed linear space. 
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3.3.5 Theorem. Let L and M be normed linear spaces, and let [L, M] be the 
collection of all bounded linear operators from L to M. The operator norm 
defined by 3.3.1 is a norm on [L, M], and if M is complete, then [L, M] is 
complete. In particular, the set L* of all continuous linear functionals on L is 
a Banach space (whether or not L is complete). 


Proor. It follows from 3.3.1 that ||A|| > 0 and ||@A|] = |a| ||Al| for all 
A € [L, M] and a € C. Also by 3.3.1, if ||A|| = 0, then Ax = Q for all x € L, 
hence A = 0. If A, B € [L, M], then again by 3.3.1, 


IA + Bl] = sup{||(A + B)x||: x € L, ||x|| < 1}. 
Since ||(A + B)x|| < ||Ax|| + ||Bx|| for all x, 
IA + Bll < ||Al] + IBI 


and it follows that [L, M] is a vector space and the operator norm is in fact a 
norm on [L, M]. 
Now let A;, A2,... be a Cauchy sequence in [L, M]. Then 


(An —Am)xIl < An — Anll [xl >0 as n,m— œ. (1) 


Therefore {A, x} is a Cauchy sequence in M for each x € L, hence A, converges 
pointwise on L to an operator A. Since the A, are linear, so is A (observe that 
A, (ax + by) = aA nx + bA, y, and let n —> co). Now given e > 0, choose N 
such that ||A, — Am|| < £ for n,m > N. Fix n > N and let m > œ in Eq. (1) 
to conclude that ||(A, — A)x|| < é||x|| for n > N; therefore ||A, — A|| — 0 as 
n — oo. Since ||A|| < ||A — A,|| + ||Az||], we have A € [L, M] and A, —> A in 
the operator norm. L 


In the above proof we have talked about two different types of convergence 
of sequences of operators. 


3.3.6 Definitions and Comments. Let A, A;, Az,... € [L, M]. We say that 


A, converges uniformly to A iff ||A, — All| —> O (notation: A, — > A). Since 
(4a — A)x|] < ||A, — All ixl, uniform operator convergence means that 
Anax — Ax, uniformly for ||x|| < 1 (or equally well for ||x|| < k, k any positive 
real number). 

We say that A, converges pointwise to A iff A,x —> Ax for each x € L. 
Thus, pointwise operator convergence is pointwise convergence on all of L. 

Uniform convergence implies strong convergence, but not conversely. For 
example, let {e,, é2,...} be an orthonormal set in a Hilbert space, and let 
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Anx = (X, €n), n= 1,2,.... Then A, converges pointwise to 0 by Bessel’s 
inequality, but A, does not converge uniformly to 0. In fact ||A,e,|| = 1 for 
all n, hence ||A, || = 1. 

There is an important property of finite-dimensional spaces that we are 
now in a position to discuss. In the previous section, we regarded C” as a 
Hilbert space, so that if x € C”, the norm of x was taken as the Euclidean 
norm ||x|| = (rye \x;|*) \/2 The metric associated with this norm yields the 
standard topology on C”. However, we may put various other norms on C”, 
for example, the L? norm ||x||, = Oar Ix; |?) P, I< p < œ, or the sup 
norm ||x|loo = max(|x;|,..., [x,|). Since the space is finite-dimensional, there 
are no convergence difficulties and all elements have finite norm. Fortunately, 
the proliferation of norms causes no confusion because all norms on a given 
finite-dimensional space induce the same topology. In other words, the open 
sets in C” will be the same, regardless of which norm we use. The proof of 
this result is outlined in Problems 6 and 7. 


Problems 


l. Let f be a linear functional on the normed linear space L. If f is not 
identically 0, show that the following are equivalent: 


(a) f is continuous. 
(b) The null space N = f~'{0} is closed. 
(c) N is not dense in L. 
(d) f is bounded on some neighborhood of 0. 
[To prove that (c) implies (d), show that if Bix,e) NN =¢, and f 
is unbounded on B(0, £), then f(B(0,£))= C. In particular, there is 
a point z € B(O, €) such that f(z) = — f(x), and this leads to a contra- 
diction. | 

2. Show that any infinite-dimensional normed linear space had a discontin- 
uous linear functional. (Let e;, e2,... be an infinite sequence of linearly 
independent elements such that |le,|| = 1 for all n. Define f appropri- 
ately on the e, and extend f to the whole space using linearity.) 


3. In Example 3.3.3(a), show that ||Al| = maxa<s<» Jy |IK(s, t)|ļdt. [If 
f? K (s, t)| dt assumes a maximum at the point u, and K (u, t) = r(t)e™, 
let z(t) = r(t)e. Let x1, x2,... be continuous functions such that 
|x, (t)| < 1 for all n and t, and f? Ix, (t) — z(t)| dt > 0 as n > œ. Since 
||AX, || < ||Al] and (Ax, )(s) > J. K (s, t)z(t) dt as n — œ, it follows that 

b 
Ja IK (u, H| dt < |IAll.J 

The same argument, with integrals replaced by sums, shows that if A 
is a matrix operator on C”, with sup norm, ||A|| = max; <;<p i= la; ;|. 
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4. Let M be a linear manifold in the normed linear space L. Denote by [x] 
the coset of x modulo M, that 1s, {x + y: y € M}. Define 


Il = inf{llyl|: y € [x]; 


note that ||[x]|| = inf{||x — z||: z € M} = dist(x, M). Show that the above 
formula defines a seminorm on the quotient space L/M, a norm if M 1s 
closed. 

5. If Lis a Banach space and M is a closed subspace, show that L/M is a 
Banach space. 


6. (a) 


(b) 


Let A be a linear operator from L to M. Show that A is one-to-one 
with A~! continuous on its domain A(L), iff there is a finite number 
m > 0 such that ||Ax|| > m||x|| for all x € L. 


Let || ||, and || |2 be norms on the linear space L. Show that the 
norms induce the same topology (in other words, the open sets are 
the same for each norm) iff there are finite numbers m, M > 0 such 
that 

milxll) < Ixl < Mlixllı forall xe. 


[This may be done using part (a), or it may be shown directly 
that, for example, if ||x||; < (1/m)]||x|lz2 for all x, then the topology 
induced by || ||; 1s weaker than (that is, included in) the topology 
induced by || ||2.] 


Let L be a finite-dimensional normed linear space, with basis e;,..., 
€n. Let || ||; be any norm on L, and define 


n 1/2 n 
xl = 5 ni) , where x= S xe. 
i=! i=] 


(a) Show that for some positive real number k, ||x||, < &||x||2 for 
all x EL. 

(b) Show that for some positive real number m, ||xl|ı > m on 
{x: |lxll2 = 1}. [By (a), the map x > |[x|],; is continuous in 
the topology induced by || ||2.] 

(c) Show that ||x||; > m||x2|| for all x € L, and conclude that all 
norms on a finite-dimensional space induce the same topology. 


(d) Let M be an arbitrary normed linear space, and let L be a 
finite-dimensional subspace of M. Show that L is closed in M. 


(Riesz lemma) Let M be a closed proper subspace of the normed 
linear space L. Show that if 0 < 6 < 1, there is an element x; € L 
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10. 


I. 


such that ||xs|| = 1 and ||x — xs|| > ô for all x € M. [Choose x, € M, 

and let d = dist (x,, M) > Osince M is closed. Choose xp € M such 

that |x; — xoll < d/8, and set x3 = (x; — xo)/||x; — xoll-] 

Let L be a normed linear space. Show that the following are equiv- 

alent: 

(a) L is finite-dimensional. 

(b) JL is topologically isomorphic to a Euclidean space C” (or R” 
if L is areal vector space), that is, there is a one-to-one, onto, 
linear, bicontinuous map between the two spaces. 


(c) L is locally compact (every point of L has a neighborhood 
whose closure 1s compact). 


(d) Every closed bounded subset of L is compact. 

(e) The set {x€ L: ||x|| = 1} 1s compact. 

(f) The set {x€ L: ||x|] = 1} is totally bounded, that is, can be 
covered by a finite number of open balls of any preassigned 
radius. 

[Problem 7 shows that (a) implies (b), (b) implies (c), (d) implies 

(e), and (e) implies (£) are obvious, and (c) implies (d) is easy. To 

prove that (f) implies (a), use Problem 8.] 

Let c be the space of convergent sequences of complex numbers 

[see 3.1.2(b)]. If f is a continuous linear functional on c, show 

that there is a unique element y = (yo, y;,...) € Z! such that for 

all x Ec, 


I(x) = ( lim xn) (x — Sa) + DA: 


Furthermore, || f || = lyo — 37721 vel + S272) [yrl If co is the closed 

subspace of c consisting of those sequences converging to 0, the rep- 

resentation of a continuous linear functional on co is simpler: f(x) 

= Se xe des WS Il = S222; |yel. Thus c§ is isometrically isomor- 

phic to 1’, 

Let ({2,.%, u) be a measure space. 

(a) Assume u finite, 1 < p< œ, (l/p)+ (/qg)=1. Tf fisa 
continuous linear functional on L? = LP (Q, F, u), show that 
there is an element y € L? such that 


f(x) = J xy du for all xE. 
2 
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(b) 


(c) 
(d) 
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Furthermore, || f || = |lyll~. If yı is another such function, show 
that y= y, ae. [u]. [Define à (A4) = f (U4), A € .F, and apply 
the Radon—Nikodym theorem. |] 

Drop the finiteness assumption on 4. If A € .¥ and u (A) < œ, 
part (a) applied to (A, 7 MA, u) provides an essentially unique 
ya E€ L? such that y4 = 0 on A‘ and 


fala) = J xya du for all xeL?; 
Q 
also, ||yallg is the L? norm of the restriction of f to A, so 


lyalla < MFI. 


(i) If u(A) < œ and u(B) < œ, show that y4 = yg ae. 
[u] on ANB. Thus yug may be obtained by piecing 
together y4 and yz. 


(ii) Let A,, n=1,2,..., be sets with |ly, ll, —> k 
= sup{llyalle: A E F, w(A) < co} < || fll. Show that 
ya, converges in L, to a limit function y, and y is 
essentially independent of the particular sequence {A,}. 
Furthermore, |lyllq < II fll- 


(iii) Show that f(x) = fẹxydu for all x € L?. Since |lyll, 
< || f || by Gi) and || fll < Ilyll, by Holder's inequality, 
we have || f|l = |lyll,. Thus the result of (a) holds for 
arbitrary u. 

Prove (a) with u finite, p = 1, g = œ. 

Prove (a) with u o-finite, p = 1, q = oo. [For an extension of 

this result, see Kelley and Namioka (1963, Problem 14M).] 

It follows that there is an isometric isomorphism of (L?)* 

and L? if l < p< œ, l <q < œ, (1/p)4+ (l1/q) = 1; if wis 

o-finite, this is true also for p = 1, g = œ. 


3.4 Basic THEoreMs OF FUNCTIONAL ANALYSIS 

Almost every area of functional analysis leans heavily on at least one of 
the three basic results of this section: the Hahn—Banach theorem, the uni- 
form boundedness principle, and the open mapping theorem. We are going to 
establish these results and discuss applications. 

We first consider an extension problem. If f is a linear functional defined 
on a subspace M of a vector space L, there is no difficulty in extending f 
to a linear functional on all of L; simply extend a Hamel basis of M to a 
Hamel basis for L, define f arbitrarily on the basis vectors not belonging to 
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M, and extend by linearity. However, if L is normed and we require that the 
extension of f have the same norm as the original functional, the problem 
becomes more difficult. We first prove a preliminary result. 


3.4.1 Lemma. Let L be a real vector space, not necessarily normed, and let 
p be a map from L to R satisfying 


p(x+ y) < píx)+ p(y) for all x, yeEL 
p(ax) = ap(x) forall xeL andall a> 0. 


[The first property is called subadditivity, the second positive-homogeneity. 
Note that positive-homogeneity implies that p(0) =O [set x =0 to obtain 
p(0) = ap(0) for all a > 0]. A subadditive, positive-homogeneous map is 
sometimes called a sublinear functional.] 

Let M be a subspace of L and g a linear functional defined on M such 
that g(x) < p(x) for all x € M. Let xo be a fixed element of L. For any real 
number c, the following are equivalent: 


(1) g(x) +Ac < p(x + Axo) for all x € Mand all à € R. 
(2) —p(—x — xo) — g(x) < c < p(x+ x) — g(x) for all x € M. 


Furthermore, there is a real number c satisfying (2), and hence (1). 
PRoor. To prove that (1) implies (2), first set à = 1, and then set A = —1 and 
replace x by —x [note g(—x) = —g(x) by linearity]. Conversely, if (2) holds 
and à > 0, replace x by x/A in the right-hand inequality of (2); if A < 0, replace 
x by x/A in the left-hand inequality. In either case, the positive homogeneity 
of p yields (1). If A = 0, (1) is true by hypothesis. 

To produce the desired c, let x and y be arbitrary elements of M. Then 

g(x) — g(y) = g(x — y) < p(x — y) 
< p(x + xo) + p(—y — xo) by subadditivity. 


It follows that 


sup[— p(—y — xo) — a(y)] < inf [p(x + xo) — g(x)]. 
yeM xe 


Any c between the sup and the inf will work. LI 
We may now prove the main extension theorem. 


3.4.2 Hahn-Banach Theorem. Let p be a subadditive, positive-homoge- 
neous functional on the real linear space L, and g a linear functional on the 
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subspace M, with g < p on M. There is a linear functional f on L such that 
f=gon M and f < p on all of L. 


ProoF. If xọ M, consider the subspace M, = L(M U {xo}), consisting of all 
elements x + àxo, x E M, à € R. We may extend g to a linear functional on 
M, by defining g(x + Axo) = g(x) + àc, where c is any real number. If we 
choose c to satisfy (1) of 3.4.1, then g; < p on M). 

Now let @ be the collection of all pairs (h, H) where h is an extension of 
g to the subspace H > M, and h < p on H. Partially order &@ by (hi, H)) 
< (ho, H2) iff Hı C H2 and h, = h on H3; then every chain in % has an 
upper bound (consider the union of all subspaces in the chain). By Zorn’s 
lemma, % has a maximal element (f, F). If F Æ L, the first part of the proof 
yields an extension of f to a larger subspace, contradicting maximality. LJ 


There is a version of the Hahn—Banach theorem for complex spaces. First, 
we observe that if L is a vector space over C, L is automatically a vec- 
tor space over R, since we may restrict scalar multiplication to real scalars. 
For example, C” is an n-dimensional space over C, with basis vectors 
(1, 0, ..., 0), ..., (0,..., 0, 1). If C” is regarded as a vector space over R, 
it becomes 2n-dimensional, with basis vectors 


(1,0,...,0),..., ,...,0, 1), (i,0,...,0),..., (0,...,0, 2). 


Now if f is a linear functional on L, with fı = Ref, fz = Imf, then fı 
and fz are linear functionals on L’, where L’ is L regarded as a vector space 
over R. Also, for all x € L, 


Fix) = f@x) + if:(ix). 
But 
fax) = if) = — fox) + if (x). 
Thus f (ix) = — f2(x), fo(ix) = f(x); consequently 
F(x) = fie) — ifix) = flix) + ifr). 


Therefore f is determined by fı (or by f2). Conversely, let fı be a lin- 
ear functional on L’. Then f, is a map from L to R such that f,;(ax + by) 
= af,(x)+ bf;(y) for all x,y € L and all a,b € R. Define f(x) = f(x) 
— if; (ix), x € L. It follows that f is a linear functional on L (and f, = Ref). 
For f, is additive, hence so is f, and if a, b € R, we have 


f(at+ib)x) = f,(ax + ibx) — if; (—bx + iax) 
= af, (x) + bf, (ix) + 1bf (x) — ia f, (ix) 
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= (a+ ib)[f (x) — if; (ix)] 
= (a+ ib) f(x). 


Note that homogeneity of fı does not immediately imply homogeneity of f. 
For fi Qx) = Af; (x) for real à but not in general for complex à. [For example, 
let L=C, fa) =x, fix) = Re x.] 

We now prove the complex version of the Hahn- Banach theorem. 


3.4.3 Theorem. Let L be a vector space over C, and p a seminorm on L. 
If g is a linear functional on the subspace M, and |g| < p on M, there is a 
linear functional f on L such that f = g on M and |f| < pon L. 


Proof. Since p is a Seminorm, it is subadditive and absolutely homogeneous, 
and hence positive-homogeneous. If g; = Re g, then g; < |g| < p; so by 3.4.2, 
there is an extension of g; to a linear map fı of L to R such that fı = g; on 
M and f; < p on L. Define f(x) = f,(x) — ifix), x € L. Then f is a linear 
functional on L and f = g on M. Fix x € L, and let f(x) = re’, r > 0. Then 


IF| =r = fe x) 


= file” x) since r is real 
< plex) since fi<p on L 
= p(x) by absolute homogeneity. 0 


3.4.4 Corollary. Let g be a continuous linear functional on the subspace M 
of the normed linear space L. There is an extension of g to a continuous linear 
functional f on L such that || f|| = |lg'l. 


Proor. Let p(x) = |lgll |x|]; then p is a seminorm on L and |g| < p on M 
by definition of ||g||. The result follows from 3.4.3. O 


A direct application of the Hahn—Banach theorem is the result that in a 
normed linear space, there are enough continuous linear functionals to distin- 
guish points; in other words, if x Æ y, there is a continuous linear functional 
f such that f(x) Æ f(y). We now prove this, along with other related results. 


3.4.5 Theorem. Let M be a subspace of the normed linear space L, and let 
L* be the collection of all continuous linear functionals on L. 


(a) If x9 ¢ M, there is an f € L* such that f =0 on M, f(x) = 1, and 
IfI = 1/d, where d is the distance from x9 to M 
(b) x E€ M iff every f € L* that vanishes on M also vanishes at xo. 
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(c) If xo 40, there is an f € L* such that || f|| = 1 and f(xo) = ||xoll; thus 
the maximum value of | f(x)|/||x||, x 40, is achieved at xo. In particular, if 
x Æ y, there is an f € L* such that f(x) Æ f(y). 


Proor. (a) First note that L(M U {xo}) is the set of all elements y = x + axo, 
x €M,aeéC, and since x ¢ M, a is uniquely determined by y. Define f 
on N= L(M U {xo}) by f(x + ax) =a; f is linear, and furthermore, || |l 
= 1/d, as we now prove. By 3.3.1 we have 


IFO 


N, y #0| 
yil 


IfI = sup 4 


= sup) Ss xem, aeC, x0 or a40| 
lx + axoll 


=p xEM, aeC, a 40 
lx + axl 


since f(y) = 0 when a = 0. Now 


jal 1 1 


Ix+axoll ly, 42 lxo — zll 
Ixo + -|| 
a 


hence || f|| = Ginf{||xo — zl: z € M}! =1/d < oo. The result now follows 
from 3.4.4. 

(b) This is immediate from (a). 

(c) Apply (a) with M = {0}, to obtain g € L* with g(x) = 1 and lell 
= 1/|lxoll; set f = Ilxollg. U 


The Hahn-—Banach theorem is basic in the study of the concept of reflexivity, 
which we now discuss. Let L be a normed linear space, and L* the set of 
continuous linear functiorals on L; L* is sometimes called the conjugate space 
of L. By 3.3.5, L* is a Banach space, so that we may talk about L**, the 
conjugate space of L*, or the second conjugate space of L. We may identify 
L with a subspace of L** as follows: If x € L, we define x** € L** by 


x™(f) =f), JEL. 


If || fa — fll — 0, then f,,(x) — f(x); hence x** is in fact a continuous linear 
functional on L*. Let us examine the map x — x** of L into L**. 


3.4.6 Theorem. Define h: L — L** by h(x) = x**. Then h is an isometric 
isomorphism of L and the subspace A(L) of L**; therefore, if x € L, we have, 
by 3.3.1, ixl = [lx**|| = suptl f@)|: f € L*, IF Il < 1}- 
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Proor. To show that h is linear, we write 


[hax + byd) = f (ax + by) = a f(x) + bf(y) = lah) (Ff) + [BACY)IF).- 


We now prove that h is norm-preserving (|k ODI] = ||x|| for all x € L); conse- 
quently, h is one-to-one. If x € L, |[A(x)](f)| = | f@)| < lixl| Il fl], and hence 
\|A(x)|| < ||x||. On the other hand, by 3.4.5(c), there is an f € L* such that 
If || = 1 and | f(x)| = |lx||. Thus 


sup{l[A@)I(A)l: feL*, IfI = 1} = lx, 
so that ||A(x)|| > ||x||, and consequently ||h(x)|| = Ilx. L 


If h(L) = L**, L is said to be reflexive. Note that L** is complete by 3.3.5 
and so, by 3.4.6, a reflexive normed linear space is necessarily complete. We 
shall now consider some examples. 


3.4.7 Examples. (a) Every Hilbert space is reflexive. For if y is the con- 
jugate isometry of 3.3.4(a), H* becomes a Hilbert space if we take (f, g) 
= (w(g), Y(f ). Thus if q E€ H**, we have, for some g € H*, g(f) = (f, 8) 
= (W(g), W(f)) = fx), where x = (g). Therefore q = h(x). 

b) If 1 < p< œ, lP is reflexive. For by 3.3.4(b), (1P )* is isometrically 
isomorphic to /7, where (1/p) + (1/4) = 1. Thus if ¢ € (IP Y* we have t(y) 
= X; Yk Zk, y € 17, where z is an element of (/7)* = I”. But then ¢ = h(z). 

Essentially the same argument, with the aid of Problem 11 of Section 3.3, 
shows that if (Q, Z, u) is an arbitrary measure space, L? (Q, Z, u) is reflexive 
for 1 < p < œ. 

(c) The space l! is not reflexive. This depends on the following result. 


3.4.8 Theorem. If L is a normed linear space and L* is separable, so is L. 
Thus if L is reflexive and separable (so that L** is separable), then so is L*. 


Proor. Let fi, f2,...form a countable dense subset of {f € L*: IfI = 1} 
(note that any subset of a separable metric space is separable). Since || f,, || = 1, 
we can find points x, € L with ||x,|| = 1 and |f,(Q%.)| > 3 for all n. Let M 
be the space spanned by the x,,; we claim that M = L. If not, 3.4.5(a) yields 
an f € L* with f =0 on M and || fl] = 1. But then 5 < | fn) = [fn Qn) 
— f(xn)| < Il fa — f || for all n, contradicting the assumption that {f,, f2,...} 
is dense. L] 


To return to 3.4.7(c), we note that /' is separable since {x € 1}: x, = 0 for all 
but finitely many k} is dense. [If x € J) and x” = (x1, ..., Xn, 0,0, .. .) then 
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x — xin 1'.] But (1')* = 1° by 3.3.4(c), and this space is not separable. For 
if S = {x € 1%: x, = 0 or | for all k}, then S is uncountable and ||x — yl| = 1 
for all x, y € S, x Æ y. Thus the sets B, = {y €1™: lly —x|] < $}, x € S, form 
an uncountable family of disjoint open sets. If there were a countable dense 
set D, there would be at Jeast one point of D in each B,, a contradiction. Thus 
l! is not reflexive. 

We now consider the second basic result of this section. Suppose that the 
A;, i belonging to the arbitrary index set 7, are bounded linear operators from 
L to M, where L and M are normed linear spaces. The uniform boundedness 
principle asserts that if L is complete and the A; are pointwise bounded, that 
is, sup{||A;x||: ¿ E€ I} < œ for each x € L, then the A; are uniformly bounded, 
that is, sup{||A;||: i E€ J} < co. Completeness of L is essential; to see this, let 
L be the set of all sequences x = (x1, x2, -..) of complex numbers such that 
x; = 0 for all but finitely many k, with the /? norm, 1 < p < co. Take M = C, 
and A,x = nx,,n = 1,2,.... For any x,A,x = 0 for sufficiently large n, so 
the A, are pointwise bounded, although ||A, || = n — oo. 

The proof that we shall give uses the Baire category theorem, which states 
that if a complete metric space is a countable union of closed sets, one of the 
sets must have a nonempty interior. 


3.4.9 Principle of Uniform Boundedness. Let A;, i € I, be bounded linear 
operators from the Banach space L to the normed linear space M. If the A; 
are pointwise bounded, they are uniformly bounded. 


Proor. Let C, = {x € L: sup, ||A;x|l <n}, n =1,2,.... Since the A; are 
pointwise bounded, >", C, = L, and since the A; are continuous, each C, 
is closed. By the Baire category theorem, for some n there is a closed ball 
B= {x €L: ||x—x9|| <r} C C,. Now if |]y|] < 1 andi € J, we have 


l 
IA; y|| ~ llA:z where z= ry 


A 


l l 
< -|A (xo + 2) + - IlA;:xoll 
r r 
me since xo +z and xg belong to B. 
r 
Thus ||A,|| < 2n/r. OU 


The uniform boundedness principle is used in an important way in the study 
of weak convergence, a concept that we now describe. 
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3.4.10 Definitions and Comments. The sequence {x,,} in the normed linear 
space L is said to converge weakly tox € L iff f(x,) > f(x) for every f € L* 
(notation: x, > x). Convergence in the metric of L(||x,, — x|| — 0) will be 
called strong convergence and will be written simply as x, — x. [In this 
terminology, pointwise convergence of the sequence of linear operators A,, to 
the linear operator A means strong convergence of A,x to Ax for each x (see 
3.3.6). ] 


It follows from the definitions that strong convergence implies weak con- 
vergence. To see that the converse does not hold, let {e,, e2, ...} be an infinite 
orthonormal sequence in a Hilbert space H. If x €e H, then (e,,x) — 0 as 
n — co by Bessel’s inequality, hence e, —» 0. But lle, || = 1, so e, does 
not converge strongly to 0. 

In a finite-dimensional space, strong and weak convergence coincide. For if 
Xn = Ajn@y + -° -+ apne, converges weakly to x = aje, +---+ ageg (where 
the e; are basis vectors for C*), let f be a continuous linear functional that 
is 1 at e; and 0 at e;, j Ai. Then f(x,) > f(x), so that a;, —> aj as n > œœ 
(i = 1,..., k). Thus x, — x strongly. 

We give a few properties of weak convergence. 


3.4.11 Theorem. (a) A weakly convergent sequence {x,} 1s bounded, that 
is, sup, ||x,|| < oo. In fact if x, —, Xo, then ||xo|| < liminf, oo ||x, ||. 

(b) If A is a bounded linear operator from L to M and the sequence {xn} 
converges weakly to xp in L, then Ax, converges weakly to Axo in M. 

(c) If the sequence {x,} converges weakly to xo, then x9 belongs to the 
subspace spanned by the x,. 


(d) Let M be a linear manifold of L. If x is the weak limit of some 
sequence in M, then x is the strong limit of some sequence in M. 


Proof. (a) The x, may be regarded as continuous linear functionals on L* 
with x,(f) = f(xn) (see 3.4.6). The x, are pointwise bounded on L* since 
fxn) > f(%); hence by the uniform boundedness principle, sup, ||x,|| < œ. 
Also, if f € L*, 


If (xo)| = lim |f @a)| = lim inf | fn) < |f Ilim inf lx, |l- 
n> oo R — OO H —> 00O 


Since ||xoll = sup{| fo): f € L*, IfI < 1}. we may conclude that ||xoll 
< lim inf, llxn ll. 

(b) If g € M*, define f = g-A; as A is continuous, we have f € L*, so 
that f(x.) > f(x), that is, g(Ax,) —> g(Axọ). But g is arbitrary; hence 
AX» > AXo. 
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(c) If this is false, 3.4.5(a) yields an f € L* with f(xọ) = 1 and f(x;) = 0, 
contradicting x; + Xo- 

(d) Let {x,} be a sequence in M converging weakly to x. By (c), x is a 
strong limit of a sequence of finite linear combinations of the x,. However, 
these finite linear combinations belong to M since M is a subspace. Thus x is 
the strong limit of a sequence in M. LI 


In order to characterize weak convergence in specific spaces, the following 
result is useful. 


3.4.12 Theorem. Let E be a subset of L* such that S(£) = L*, and let 


{xn} be a sequence in L. If x9 € L, then x, > xo iff sup, ||x,|| < co and 
fxn) > f(xo) for all fe E. 


Proof. The “only if” part follows from 3.4.11(a) and the definition of weak 
convergence, For the “if” part, let fe L*, and choose elements f} € L(E) 
with || f — fell > 0. Then 


If (Xn) — FOO)! < f&n) — find + Fen) — Feo)! + fro) — fæo)l 
< MF — Fell Wall + fe On) — Se ol + Ife — Fil ol. 


Since the x, are bounded in norm, given £ > 0, we may choose k such that 
the right-hand side is at most | fi; (xn) — fk(xo)| + e; but f(x.) > fr(xo) as 
n — œ since f; € L(E), and since e is arbitrary, the result follows. LI 


We now describe weak convergence in l? and L”. 
3.4.13 Theorem. Assume 1 < p < co. 


W 
(a) Let xn = (Xni, Xn2,..-) E LP, n = 1,2,.... f z e lP, then x, —— z 
, wW 
iff sup, ||x,|| < co and xak ——> Zk as n — œ for each k. 


(b) Let x, € LP (Q, F, u), n = 1,2,..., where u is assumed finite. If 
z E€ L?(Q,.F, u), then x, — s> z iff sup, ||x,|| < co and ee du —> J, Zdu 
for each A € F. (It will often be convenient to blur the distinction between 
L? and L?, and treat the elements of L? as functions rather than equivalence 
classes.) 


Proor. (a) Define fg € (l?)* by f(x) = xg; then fp corresponds to the 
sequence in /* with a one in position k and zeros elsewhere [see 3.3.4(b)]. 
Take E = {f;, f2,.-.} and apply 3.4.12. 


(b) For each A €.¥, define fa € (L?)* by fax) = J, xdu; fa corre- 
sponds to the indicator function 74 € L? (see Problem 11, Section 3.3). Take 
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E as the set of all f4, A €E F; E spans (L’”)* because the simple functions 
are dense in L2, and the result follows from 3.4.12. O 


Note that 3.4.13 (b) holds also when p = 1 since simple functions are dense 
in L”. 

We now consider the third basic result, the open mapping theorem. This will 
allow us to conclude that under certain conditions, the inverse of a one-to-one 
continuous linear operator is continuous. We cannot make this assertion in 
general, as the following example shows: Let L be the set of all continuous 
complex-valued functions x on [0, 1] such that x(0) = 0, M = {x € L: x has 
a continuous derivative on [0, 1]}; put the sup norm on L and M. If A is 
defined by (Ax)(t) = fo x(s)ds, 0 <t < 1, then A is a one-to-one, bounded, 
linear operator from L onto M. [The condition x(0) = 0 is used in showing 
that A is onto.] 

But A`! is discontinuous; for example, if x,(t) = sinnt, then y,(t) = 
(Ax, (t) = (1 — cosnt)/n, sothat y, — Oin M, butx, has no limitin L. A hypo- 
thesis under which continuity of the inverse holds is the completeness of both 
spaces L and M. In the above example, M is not complete; for example, a se- 
quence of polynomials may converge uniformly to a continuous function without 
a continuous derivative (in fact to a continuous nowhere differentiable function). 

We now state the third basic theorem. 


3.4.14 Open Mapping Theorem. Let A be a bounded linear operator from 
the Banach space L onto the Banach space M. Then A is an open map, that is, 
if D is an open subset of L, then A(D) is an open subset of M. Consequently 
if A is also one-to-one, then A~! is bounded. 


Proor. Let B, = B(O, r) be the open ball with center at O and radius r in L. If 
we can show that A(B,) contains a ball with center at 0 in M, we are finished, 
since neighborhoods of an arbitrary point are translations of neighborhoods of 
0. Now L = UL, B, and A maps onto M, so M =|), A(B,). Since M is 
complete, we can conclude from the Baire category theorem that some A(B, ) is 
not nowhere dense. Since A(B,) and A(B,,) differ only by a scale factor, A(B,) 
is not nowhere dense. Thus for some yo € M andr > 0, the ball B(yo, 4r) is 
contained in the closure A(B,). It follows that we may select y; = Ax, in 
A(B,) such that || y; — yoll < 2r. [If y; unluckily ends up on the boundary of 
A(B,), approximate yı by a very close element in A(B,); this element can be 
chosen so that its distance to yo is still < 2r.] By the triangle inequality, 


B(y,, 2r) C B(yo, 4r) C A(B)). (1) 
We claim that 


if || y|| < 2r then y € A(B2). (2) 
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To see this, note that y= Ax; + (y —Ax;), and since y; = Ax,, the second 
term belongs to A(B,;) by (1). Therefore y is Ax, plus the limit of a sequence 
in A(B,), so that y € A(x, + B,). But x; € B}, so ||x,|| < 1, and consequently 
xı +B, C Bo, proving (2). 

Again because the B; differ only by a scale factor, we can repeat the above 
argument with Bz replaced by B; for any k > 0; 2r becomes kr, and (2) 
becomes: 


if || y|| < kr then y E€ A(Bx). (3) 


We are going to show that if || y|| < r/2 then y € A(B,). Thus A(B,), and 
hence by scaling the image of any ball with center at O in L, contains a 
ball with center at O in M, completing the proof. We use (3) to generate an 
inductive procedure: 


Set k = 1/2; choose x; € B,/2 such that ||y — Axı || < r/4. 

Now apply (3) with k = 1/4; choose xz € B,,4 such that || y — Ax, — Ax2|l| 
<r/8. 

Now apply (3) with k = 1/8; choose x3 € By,;g such that ||y —Ax, — Ax, 
— Ax3|| < r/16, and continue in this fashion. In general, we select x, € B,;2n 
such that 


Í r 
ly- $ Axil < 57. (4) 


i=] 


fl 


If sn = X; x; then for n > m we have ||s,, — Smll < > ie-m-+ (xl > 0 as 
n, m —> œ, since ||x;|| < 27‘. The completeness of L implies that )~°~, x, 
converges. If x is the sum of the series then ||x|| < eer lx, || < Sy 2" 
= 1, and by (4), Ax = y. Therefore y € A(B,). O 


The open mapping theorem allows us to prove the closed graph theorem, 
which is often useful in proving that a particular linear operator is bounded. 

First we observe that if L and M are normed linear spaces, we may define 
a norm on the product space L x M by Ix, yil = (dix? + Iyl), x €L, 
y € M, where p is any fixed real number in [1, oo), For if x, x’ € L, y, y EM, 


dæ + xM? + My + yP < Call + Mx")? + y + lly’? 1” 
< (Hx? + Hy?) + (bx? + lly’ WP)? 
by Minkowski’s inequality applied to RŽ. 
Therefore the triangle inequality is satisfied and we have defined a norm 


on L x M (the other requirements for a norm are immediate), Furthermore, 
(Xn, Yn) > (x, Y) iff x, —> x and y, — y; thus regardless of the value of p, the 
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norm on L x M induces the product topology; also, if L and M are complete, 
so is L x M. [The same result is obtained using the analog of the L” norm, 
that is, ||(x, y| = max([lx||, Ilyll)-] 


3.4.15 Definition. Let A be a linear operator from L to M, where L and 
M are normed linear spaces, We say that A is closed iff the graph G(A) 
= {(x, Ax): x € L} is a closed subset of L x M. Equivalently, A is closed iff 
the following condition holds: 

If x, E L, x, > x, and Ax, — y, then (x, y) € G(A); in other words, 
y = Ax. This formulation shows that every bounded linear operator is closed. 
The converse holds if L and M are Banach spaces. 


3.4.16 Closed Graph Theorem. If A is a closed linear operator from the 
Banach space L to the Banach space M, then A is bounded. 


Proof. Since G(A) is a closed subspace of L x M, it is a Banach space. 
Define P: G(A) — L by P(x, Ax) = x. Then P is linear and maps onto L, and 
|| P(x, Ax)|| = ||x]] < || (x, Ax)||; hence ||P|| < 1 so that P is bounded. [Alterna- 
tively, if (xn, AX,) —> (x, y), then x, — x, proving continuity of P.] Similarly, 
the linear operator Q: G(A) > M given by Q(x, Ax) = Ax is bounded. If 
P(x, Ax) = 0, then x = Ax = 0, so P is one-to-one. By 3.4.14, P7! is bounded, 
and since A = Q-P™!, A is bounded. O 


As an application of the closed graph theorem, we show that if P is the pro- 
jection of a Banach space L on a closed subspace M, then P is continuous [see 
3.3.3(c)]. Let {x,} be a sequence of points in L with x, — x, and assume Px, 
converges to the element y € M. Recall that in defining a projection operator 
it is assumed that L is the direct sum of closed subspaces M and N; thus 


Xn = Yn + Zn where Yn = Px, EM, 2 EN. 


Since x, — x and y, — y, it follows that z, — z = x — y, necessarily in N. 
Therefore x = y+z,y E M, z €N, so that y= Px, proving P closed. By 
3.4.16, P is continuous. 


Problems 


1. Show that a subadditive, absolutely homogeneous functional on a vector 
space must be nonnegative, and hence a seminorm. Give an example of 
a subadditive, positive-homogeneous functional that fails to be nonneg- 
ative. 

2. Let (Q,.¥, wz) be a measure space, and assume ¥ is countably generated, 
that is, there is a countable set & C .¥ such that o( £) = F. (Note that 
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the minimal field o over 4 is also countable; see Problem 9, 1.2.) If u 
is o-finite on Fo and 1 < p < œ, show that L? (Q, Y, u) is separable. 
If in addition there is an infinite collection of disjoint sets A € .Y with 
u(A) > 0, show that L!(Q, ¥, u) is not reflexive. 


If L and M are normed linear spaces and [L, M] is complete, show that 
M must be complete. 


Let A € [L, M], where L and M are normed linear spaces. The adjoint of 
A is an operator A*: M* — L*, defined as follows: If f € M* we take 
(A* f)(x) = f (Ax), x € L. Establish the following results: 


(a) ||A*|| = IIAT. 

(b) (aA + bB)* = aA* + bB* for all a,b € C,A,B e [L, M]. 

(c) If AE [L,M], B © [M,N], then (BA)* = A*B*, where BA is the 
composition of A and B. 

(d) IfA € [L,M],A maps onto M, and A`! exists and belongs to [M, L], 
then (A~!)* = (A*)7!, 

Define the annihilator of the subset K of the normed linear space L 

as K+ = {f €L*: f(x) =0 for all x € K}. Similarly, if J C L*, define 

J+={xeL: f(x) =0 forall f € J}. If A € [L, M], we denote by N (A) 

the null space {x € L: Ax = 0}, and by R(A) the closure of the range of 

A, that is, the closure of {Ax: x € L}. Establish the following: 


(a) For any K CL, K++ = S(K), the space spanned by K. 

(b) R(A)+ = N(A*) and R(A) = N(A*)-. 

(c) R(A) =M iff A* is one-to-one. 

(d) R(A*)+ = N(A). 

(e) For any J CL*, SJ) c J++; SJ) =J*+ if L is reflexive. 

(f) R(A*) c N(A)+; R(A*) = N(A)* if L is reflexive. 

(g) If R(A*) = L*, then A is one-to-one; the converse holds if L is 
reflexive. 


Consider the Hahn-Banach theorem 3.4.2, with the additional assump- 
tion that L is a normed linear space (or more generally, a topological 
vector space) and p is continuous at 0; hence continuous on all of L since 
| p(x) — p(y)| < p(x — y). Show that if L is separable, the theorem may 
be proved without Zorn’s lemma. It follows that 3.4.3 and 3.4.4 do not 
require Zorn’s lemma under the above hypothesis. 


If xo is a finitely additive, nonnegative real-valued set function on a field 
Jp of subsets of a set Q, use the Hahn-—Banach theorem to show that 
Ho has an extension to a finitely additive, nonnegative real-valued set 
function on the class of all subsets of (2. Thus in one respect, at least, 
finite additivity is superior to countable additivity. 


3.5 
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If the sequence of bounded linear operators A, on a Banach space con- 
verges pointwise to the (necessarily linear) operator A, show that A is 
bounded: in fact 


A|| < lim inf ||A,|| < sup |A; || < co. 
Fi OO H 


Let {A;, i € I} be a family of continuous linear operators from the Ba- 
nach space L to the normed linear space M. Assume the A; are weakly 
bounded, that is, sup; |f (A;x)| < œ for all x € L and all f € M*. Show 
that the A; are uniformly bounded, that is, sup, ||A;|| < co. 


(a) If the elements x;, i € J, belong to the normed linear space L, and 
sup; | f(;)| < co for each f € L*, show that sup, ||x;|| < œœ. 


(b) If A is a linear operator from the normed linear space L to the 
normed linear space M, and f-A is continuous for each f € M*, 
show that A 1s continuous. 

Let L, M, and N be normed linear spaces, with L or M complete, and let 

B: Lx M — N bea bilinear form, that is, B(x, y) is linear in x for each 

fixed y, and linear in y for each fixed x. If for each f € N*, f (B(x, y)) 

is continuous in x for each fixed y, and continuous in y for each fixed 

x, show that B is bounded, that is, 


sup{|B(, y)|: Ixl, Iyl < 1} < co. 


Equivalently, for some positive constant k we have |B(x, y)| 
< k|ixlillyll for all x, y. 


Give an example of a closed unbounded operator from one normed linear 
space to another. 


Let A be a bounded linear operator from the Banach space L onto the 
Banach space M. Show that there is a positive number k such that for 
each y € M there is an x € L with y = Ax and |x|| < &|ly||. This result 
is sometimes called the solvability theorem. 


3.5 REFERENCES 

There is a vast literature on functional analysis, and we give only a few 
representative titles. Readable introductory treatments are given in Liusternik 
and Sobolev (1961), Taylor (1958), Bachman and Narici (1966), and Hal- 
mos (1951); the last deals exclusively with Hilbert spaces. Among the more 
advanced treatments, Dunford and Schwartz (1958, 1963, 1970) emphasize 
normed spaces, Kelley and Namioka (1963) and Schaefer (1966) emphasize 
topological vector spaces. Yosida (1968) gives a broad survey of applications 
to differential equations, semigroup theory, and other areas of analysis. 

More recent works are by Conway (1990) and Wojtaszczyk (1991). 


CHAPTER 


4 


BASIC CONCEPTS OF PROBABILITY 


4.1. INTRODUCTION 

The starting point for probability theory is a set Q called the sample space 
whose points are in one-to-one correspondence with the possible outcomes of 
a given performance of a random experiment. For example, if two dice are 
tossed, we may take Q to have 36 points, one for each ordered pair (i, /), 
i,j} =1,...,6. The sample space for a given experiment is not unique. For 
example, if two dice are tossed and N is the sum of the faces, we may take 
Q to consist of 11 points, corresponding to the outcomes N = 2,3,..., 12. 
The particular sample space to be used will be determined by the problem at 
hand. For example, in the dice-tossing example above, if we are interested in 
the result of the first toss, the sample space corresponding to N = 2,3,..., 12 
will not be of value. 

An “event” in a random experiment corresponds to a question that can be 
answered “yes” or “no.” For example, in the dice-tossing example, let {2 be 
the set of ordered pairs (i, j), i, j = 1,..., 6. We may ask the question “Is the 
maximum of the two coordinates i and j less than or equal to 2?” The subset 
of Q associated with a “yes” answer is A = {(1, 1), (1, 2), (2, 1), (2, 2)}; the 
subset associated with a “no” answer is A‘. 

Thus it is reasonable to define an event as a subset of (2. However, in some 
situations not all subsets may be regarded as events. As an example, admittedly 
somewhat artificial, suppose that a coin is tossed four times, and {2 consists 
of the 16 sequences of length 4 with components H and T. Assume that only 
the results of the first two tosses are written down. If A is the set of points 
of Q corresponding to at least three heads, then A is not “measurable,” that 
is, the given information concerning w is not sufficient to determine whether 
or not œw € A. More serious problems arise when Q is R”; in this case we are 
almost always forced by mathematical consistency requirements to take the 
event class to be the Borel sets rather than the collection of all subsets of Q. 

The development of the mathematical theory will be facilitated if we require 
that the event class form a o-field. Thus we may form countable unions, 
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countable intersections, and complements of events and be assured that the 
resulting sets are also events. 

Finally, we must talk about the probability P(A) assigned to an event A. The 
basic physical requirement is that P(A) correspond to the relative frequency of 
A in a very large number of independent repetitions of the random experiment. 
It follows that P should be a nonnegative, finitely additive set function, with 
P(Q) = 1. In order to be able to calculate the probability of a limit of events, 
we must require P to be countably additive. 

The above discussion may be summarized by saying that P is a probability 
measure on the o-field.¥. Thus the basic mathematical object we are to study 
is a probability space (82, 7, P). 


4.2 DISCRETE PROBABILITY SPACES 

If the sample space Q is a finite or countably infinite set, measure-theoretic 
difficulties do not arise. We take .¥ to consist of all subsets of (2, and assign 
probabilities in the following canonical way. Let Q = {@;, @2,...}, and let 
P1, p2,... be nonnegative numbers whose sum is 1. If A is any subset of Q, 


we define 
P(A)= >> pi. 
w;EA 


In particular, 
P{a;} = Pi. 


Then P is a probability measure, and the probability of the event A is computed 
simply by adding the probabilities of the individual points of A. 

If Q is countable, .¥ consists of all subsets of (2, and P is defined as above, 
(Q, Z, P) is called a discrete probability space. 

Classical probability theory was concerned with the special case 
Q = {@),...,@n}, pi = 1/n,i= tł,...,n. In this case, 


number of points in A 
P(A) = ——______.___.__~ 
number of points in Q 


Thus to find P(A) we count the number of favorable outcomes and divide 
by the total number of outcomes. 

Unless otherwise specified, if Q 1s countable, we always take .¥ to contain 
all subsets of Q. Thus all subsets of Q are events, all functions on Q are 
measurable, and measure-theoretic machinery is not needed. 


4.3 INDEPENDENCE 
Intuitively, two events A and B are independent if a statement concerning 
the occurrence or nonoccurrence of one of the events does not change the 
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odds about the other event. Let us translate the physical requirement into 
mathematical terms. Suppose, for example, that P(A) = 0.4 and P(B) = 0.3. 
In a long sequence of repetitions of the random experiment, A will occur 
approximately 40% of the time. If B is to be independent of A, the occurrence 
of A will not influence the odds about B, so that if we examine only those trials 
on which A occurs, B will occur roughly 30% of the time; hence P(A N B) 
= (0.4)(0.3) = P(A)P(B). Similarly, the nonoccurrence of A will not influence 
the odds about B; hence P(A‘ N B) = (0.6)(0.3) = P(AS)P(B). 

Conversely, if P(A MB) = P(A)P(B) and P(A‘ N B) = P(A‘)P(B), B will 
be independent of A. If we examine only the trials on which A occurs, B must 
occur roughly 30% of the time in order to have P(A N B) = P(A)P(B). Thus 
the occurrence of A does not influence the odds about B, and similarly, the 
occurrence of A‘ does not change the odds about B. 

The above discussion suggests that we call the event B independent of A if 
and only if P(A N B) = P(A)P(B) and P(A‘ N B) = P(A‘)P(B). However, the 
first condition implies the second. Suppose P(A N B) = P(A)P(B). Then 


P(A‘ N B) = P(B — A) = P(B — (AN B)) 
= P(B) — P(ANB) since ANBCB 
= P(B) — P(A)P(B) by hypothesis 
= (1 — P(A))P(B) 
= P(A‘)P(B). 
Therefore B is independent of A if and only if P(A N B) = P(A)P(B). But this 
condition is not altered if A and B are interchanged; hence B is independent 


of A if and only if A is independent of B. We may therefore formulate the 
definition of independence as follows. 


4.3.1 Definition. Two events A and B are independent iff P(AMB) 
— P(A)P(B). 


We now consider independence of more than two events. If the events 
Aj, i € I, are to be independent, and 1,,..., i, are distinct indices, a statement 
about one or more of the events A;,,..., A;, should not change the odds about 
any of the remaining events. The physical discussion at the beginning of the 
section leads to the following requirement. 


4.3.2 Definition. Let I be an arbitrary index set, and let A;, i € J, be events 
in a given probability space. The A; are independent iff for all finite collections 
{i,,..., &} of distinct indices in 7, we have 


P(A; NA; +++ NAg) = P(A )P(Ai,) +++ P(Aij,). 
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4.3.3 Comments. (a) If the events A;,i € J, are independent and any event 
is replaced by its complement, independence is maintained, that is, 


P(Bi, O +++ B) = P(B; ) ++ PB;,). 


where i1, ..., ią are distinct indices and B;, is either Aj, Or Åi,» j= ł,..., k. 

We have essentially proved this in the discussion preceding the definition 
of independence by showing that P(A N B) = P(A)P(B) implies P(A‘ N B) = 
P(A‘)P(B). 

(b) If A,,...,A, are events such that A;,,...,A;, are independent for all 
distinct indices i,,...,7%%,k =2,..., n — 1, it does not follow that A;,...,A, 
are independent. For example, let a coin be tossed twice, and assign probabi- 
lity A to each of the outcomes HH, HT, TH, TT. Let A = {first toss is a head}, 
B= {second toss is a head}, C = {first toss = second toss} = {HH, TT}. Then 


P(A N B) = 4 = P(A)P(B), 
P(ANC) = } = P(A)P(C), 
P(BNC) = } = P(B)P(C). 


But 
P(A N BNC) = } P(A)P(B)P(C). 


Thus A and B are independent, as are A and C, and also B and C, but A, B, 
and C are not independent. 

Conversely, if P(A; N- N An) = P(A;)+++P(A,), it does not follow that 
P(A;, +++ A Ag) = P(A; ) ++ P(Ai,) when k < n. For example, suppose that 
two dice are tossed, and let Q be all ordered pairs (i, j), i, j = 1, 2,..., 6, 
with probability + assigned to each point. Let 


A = {second die is 1, 2 or 5}, 
B = {second die is 4, 5 or 6}, 


C = {the sum of the faces is 9}. 


Then 
P(A NB) = t # P(A)P(B) =}, 
P(A N C) = $ # P(A)P(C) = 3. 
P(BN C) = 75 # P(B)P(C) = ġ. 
but 


P(A N BNC) = $ = P(A)P(B)P(C). 
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4.4 BERNOULLI TRIALS 

A sequence of n Bernoulli trials consists of n independent observations, 
with the property that each observation has only two possible results, called 
“success” and “failure.” The probability of success on a given trial is p and 
the probability of failure is q = 1 — p. 

To construct a probability space that meets the given physical requirements, 
we take Q to be all 2” ordered sequences of length n with components 1 and 0, 
with 1 indicating success and 0 failure. Consider a typical sample point @ with 
ones in positions i;,..., ig and zeros in positions i;,;,-..., in. If A; is the event 
of obtaining a success on trial i, so that A; = {w: the ith coordinate of w is 1}, 
we have 

{w} = Aj, N NA, NAL +++ NA; . 
Since the trials are independent and P(A;) = p for all i, the probability as- 
signed to w is determined; it must be 


Pio} = P(A) PAPAE, ++» PAE) = pg. 


Ik+i 

Now there are (7) = n!/k!(n — k)! sequences in Q having exactly k ones, 

because such a sequence is determined by selecting k positions out of n for 
the ones to occur. Thus the probability of obtaining exactly k successes is 


p(k) = (i ea. k=0,1,...,n. (1) 


By the binomial theorem, 5~;_, p(k) = (p + q)" = 1; therefore the sum of 
the probabilities assigned to all points is 1, and we have a legitimate proba- 
bility measure. 

A sequence of n generalized Bernoulli trials consists of n independent 
observations, such that each observation has k possible outcomes (k > 2). If 
the k outcomes are labeled b;,..., bp, the probability that b; will occur on a 
given trial is p;, where the p; are nonnegative and yi pi = I. 

To construct an appropriate probability space, we take Q to be all k” ordered 
sequences of length n with components b,,...,b;,. If œ is a sample point 
having n; occurrences of b;, i = 1,...,k, the independence of the trials and 
the assumption that the probability of obtaining b; on a given trial is p; leads 
us to assign to œ the probability p;' py +-+ p,*. 

Now to find the number of sequences in Q in which b; occurs exactly 
n; times, i= 1,...,k, we reason as follows. Such a sequence is determined 
by selecting n; positions out of n to be occupied by b;’s, then nz positions 
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from the remaining n — n; for the b2’s, and so on. Thus the total number of 
sequences 1s 


a) G 


n! 


nilm! -e ng! 


The total probability assigned to all points is 


n! Hi A? Hg 
>, P P1 P2 tt Pk > 
AL, IQ tt Age 


where the sum is taken over all nonnegative integers n;,..., ng whose sum 
is n. But this is (pi +---+ p) = 1, using the multinomial theorem. 
The probability that b, will occur n; times, b2 will occur nz times, ..., and 


b, will occur n; times, is 


p(ni,...,4p) = ——— P +++ Dy (2) 
lees Hyp! 
where n,... ng =O, 1,- ni +: tn, =n. 


4.5 CONDITIONAL PROBABILITY 

If two events A and B are independent, a statement about the occurrence 
or nonoccurrence of one of the events does not change the odds about the 
other. In the absence of independence, the odds are altered, and the concept 
of conditional probability gives a quantitative measure of the change. 

For example, suppose that the probability of A is 0.4 and the probability 
of ANB is 0.1. If we repeat the experiment independently a large number 
of times and examine only the trials on which A has occurred, B will occur 
roughly 25% of the time. In general, the ratio P(A N B)/P(A) is a measure of 
the probability of B under the condition that A is known to have occurred. 

We therefore define the conditional probability of B given A, as 

P(B |A) = 749%) (1) 
P(A) 
provided P(A) > 0. 

In the next chapter, we shall discuss in detail the concept of conditional 
probability P(B|A) when the event A has probability 0. This is not a degenerate 
case; there are many natural and intuitive examples. Of course, the definition 
(1) no longer makes sense, and the approach will be somewhat indirect. At 
this point we shall only derive a few consequences of (1). 


172 4 BASIC CONCEPTS OF PROBABILITY 


4.5.1 Theorem. (a) If P(A) >0, A and B are independent iff P(B|A) 
= P(B). (Similarly, independence is equivalent to P(A | B)= P(A) ff P(B)>0.] 
(b) If P(A N---MA,y_;) > O, then 


P(A, N+- N An)= P(A )P(A2 |A )P(A3 | Ai Az) +++ P(An [Ai NA+- NA An). 


Proof. Part (a) follows from the definitions of independence and conditional 
probability. To prove (b), observe that P(A; 1---MA,-1) > 0 implies that 
P(A;), P(A; N A2), ..., P(Ay O --+MApy—2) > 0, so all conditional probabili- 
ties are well defined. Now by the definition of conditional probability, 


P(A; ON AA) = P(A; N-+* MA,_1) P(A, lA +++ NAA) 


An induction argument completes the proof. O 


The following result will be quite useful. 


4.5.2 Theorem of Total Probability. Let Bı, B2, ...form a finite or countably 
infinite family of mutually exclusive and exhaustive events, that is, the B; are 
disjoint and their union is Q. 


(a) If A is any event, then P(A) = 5), P(A N B;). Thus P(A) is calculated 
by making a list of mutually exclusive exhaustive ways in which A can happen, 
and adding the individual probabilities. 

(b) P(A) = 5°, P(B;)P(A|B;), where the sum is taken over those i for 
which P(B;) > 0. Thus P(A) is a weighted average of the conditional proba- 
bilities P(A | B;). 


PROOF. 
(a) P(A) = PAN Q) = PANU; B) = PUAN BD = YX; PANB)). 
(b) This follows from (a) and the fact that P(A N B;) = 0 if P(B;) =O, 


4.5.3 Example. A positive integer I is selected, with P{J =n}= (4), 
n=1,2,.... If I takes the value n, a coin with probability e~” of heads is 
tossed once. Find the probability that the resulting toss is a head. 

Here we have specified P(B,,) = (5) where B, = {/=n},n=1,2,.... 
If A is the event that the coin comes up heads, we have specified P(A | B, ) 
=e ". By the theorem of total probability, this is enough to determine P(A). 
Formally, we may take Q to consist of all ordered pairs (n, m), n = 1, 2,..., 
m = Q0 (tail) or 1 (head). We assign to the point (7, 1) the probability Gy e", 
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and to (n,0) the probability (+) —e~"). We may then verify that P(B,) 
= (+) , P(A | Ba) =e". Hence 


ee fl\" ao e) 
rw =¥ (3) = ey 


n=] 


4.6 RANDOM VARIABLES 

Intuitively, a random variable is a quantity that is measured in connection 
with a random experiment. If (Q2,.%, P) is a probability space and the outcome 
of the experiment corresponds to the point w € (2, a measuring process is car- 
ried out to obtain a number X(@). Thus X is a function from the sample space 
(2 to the reals (or the extended reals). For example, if (Q, 7, P) corresponds 
to a sequence of four Bernoulli trials (4.4) and X is the number of successes, 
then X(1 0 1 1) = 3, X(0 1 0 0) = 1, and so on. 

If we are interested in a random variable X defined on a given probability 
space, we generally want to know the probability of events involving X; for 
example, the probability that in a given performance of the experiment the 
value of X will belong to B, where B is a set of real numbers. In particular, 
we will be interested in P{w: a < X(w) < b} for all real a, b. Thus .¥ must 
contain all sets of the form X7! (a, b], and therefore all sets X7! (B), B a Borel 
set in R. 


4.6.1 Definitions. A random variable X on a probability space (Q, .¥, P) is 
a Borel measurable function from Q to R. In the terminology of 1.5, we have 
X: (Q, F) > (R, #(R)). In many situations it is convenient to allow X to 
take on the values too; X is said to be an extended random variable iff X is 
a Borel measurable function from Q to R, that is, X: (Q, Z) —> (R, .#(R)). 

If X is a random variable on (Q, .¥, P) the probability measure induced by 
X is the probability measure Py on .# (R) given by 


Py(B) = P{w: X(@) € B}, Be BR). 


The numbers Py(B), B € (R), completely characterize the random vari- 
able X in the sense that they provide the probabilities of all events involving 
X. It is useful to know that this information may be captured by a single 
function from R to R. 


4.6.2 Definition. The distribution function of a random variable X is the 
function F = Fy from R to [0, 1] given by 


F(x) = P{w: X(w) < x}, x real. 
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Since, for a < b, F(b) — F(a) = P{w:a < X(w) < b} = Py(a, b|, F is a 
distribution function corresponding to the Lebesgue—Stieltjes measure Py 
(1.4). In particular, F is increasing and right-continuous. By 1.2.7, F(x) > 1 
as x — oo and F(x) —> 0 as x — —oo. Thus among all distribution functions 
corresponding to Py, we choose the one with F(co) = 1, F(—co) = 0. 

Very often, the following statement is made: “Let X be a random variable 
with distribution function F,” where F is a given function from R to [0, |] that 
is increasing and right-continuous, with F(co) = 1, F(—oo) = 0. There is no 
reference to the underlying probability space (Q, Z, P), and actually the nature 
of the underlying space is not important. The distribution function F deter- 
mines the probability measure Py, which in turn determines the probability of 
all events involving X. The only thing we have to check is that there be at least 
one (Q, 7, P) on which a random variable X with distribution function F can 
be defined. In fact we can always supply the probability space in a canonical 
way; take Q = R, Z =.#(R), with P the Lebesgue—Stieltjes measure cor- 
responding to F, and define X(w) = w, w € Q, that is, X is the identity map. 
Since Py(B) = P{w: X(w) € B} = P(B), X has induced probability measure P 
and therefore distribution function F. 

Thus if F: R — [0, 1] is increasing and right-continuous, with F (co) = 1, 
F(—oo) = 0, then F is the distribution function of some random variable. 

We isolate some particularly important classes of random variables. 


4.6.3 Definitions and Comments. Let X be a random variable on (Q, Z, P). 
We say that X is simple iff X can take on only finitely many possible values, 
discrete iff the set of values of X is finite or countably infinite. (Any random 
variable on a discrete probability space is discrete, since {2 is countable.) 


If X is discrete, and the values {x,} of X can be arranged so that x, 
< Xn41 for all n, then the distribution function F is a step function with a 
discontinuity at each x,, of magnitude p, = P{X = xn}, F is constant between 
the x, and takes the upper value at each discontinuity. To see this, observe 
that if x,-; <a < Xn <b < Xn41, then F(b) — F(a) = Pla < X < b}= pr; 
if x, <c <d < x,41, then F(d) — F(c) = 0. 

If X is an arbitrary discrete random variable, the properties of X are com- 
pletely determined by the probability function p,, defined by 


px (x) = P{X = x}, xER. 


Explicitly, 
Px(B) = >> px). 


xeEB 


This is a countable sum since py is 0 except at the xp. 
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Thus a discrete random variable may be specified by giving a countable set 
{xn, n = 1, 2,...} of real numbers and a set of probabilities { p,,n = 1, 2,...} 
(Pn = 0, >|, Px = 1), where pẹ is to serve as P{X = xn}. The probability that 
X belongs to B is found by summing the p, for those x, which belong to B. 

The random variable X is said to be absolutely continuous iff there is a 
nonnegative real-valued Borel measurable function f on R such that 


F(x) = [ fdt, xeER. 


We call f the density or density function of X; because F(x) —> 1 as x > ov, 
we have ine f(x)dx = 1. 
If X is absolutely continuous with density f, it follows that 


Py(B) = J so dx for each Be #(R). 


For the measure u defined by u (B) = f pj (x) dx, B € P (R), satisfies (a, b] 
= F(b) — F(a), a < b. Thus u is the Lebesgue—Stieltjes measure correspond- 
ing to F; hence u = Py. 

Thus absolute continuity of X means that Py < Lebesgue measure, or 
equivalently, by 2.3.1, Fy is an absolutely continuous function. 

Any nonnegative Borel measurable function f on R with f °°. T(x) dx = 1 
is the density of some absolutely continuous random variable X. Let F(x) 
= io f(t) dt; F is clearly increasing, and by 2.3.4, F is absolutely con- 
tinuous, hence continuous, on R. Since F(oo) = 1, F(—co) =0, F is the 
distribution function of some random variable X, and X must have density f. 

The random variable X is said to be continuous iff its distribution function 
F is continuous on all of R. Equivalently, X is continuous iff P{X = x} = 0 
for all x [see (5) of 1.4.5]. 

We have seen that absolute continuity implies continuity. If F is the Can- 
tor function (Problem 3, Section 2.3), extended so that F(x) = 1 for x > 1, 
and F(x) =0 for x < 0, a random variable with distribution function F is 
continuous but not absolutely continuous. 

Some typical examples of density functions are: 


(1) Uniform density on [a, b]: 
fix) = dP- a<x<b, 


0, elsewhere. 
(2) Exponential density: 
o fae, x > 0, 
IO) = p x<0, 


where A > 0. 
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(3) Two-sided exponential density: 
f(x) = sre te , A> 0. 


(4) Normal density: 


l (x — oy) 
x)= exp | — ,o>Q, m real. 
(5) Cauchy density: 
F(x) ú 0>0 
= ——, > 0. 
ETEEN 


Finally, some remarks on terminology. We often abbreviate 
{w:X(w) € B} by {X € B}; note that this set is also X`! (B), the preimage 
of B under the mapping X. Similarly, {@: a < X(w) < b} will be abbreviated 
by {a < X < b}. The letter # will always stand for the Borel sets of an appro- 
priate space. Thus f: (R”,.4) > (R, 2) means that f~'(B) € 4(R") for 
each B € .#(R). The phrase “almost surely,” abbreviated a.s., is often used 
in the literature. It means “almost everywhere” with respect to a specified 
probability measure. 


4.7 RANDOM VECTORS 

We now consider situations involving more than one random variable asso- 
ciated with the same experiment. 

An n-dimensional random vector on a probability space (Q, 3, P) is a 
Borel measurable map from Q to R”. 

If X: 02 > R” and X; = p;°X, where p; is the projection of R” onto the ith 
coordinate space, then X is Borel measurable iff each X; is Borel measurable 
(see 1.5.8). Thus a random vector may be regarded as an n-tuple (X,,..., Xn) 
of random variables. 

Much of the development of the previous section carries over. As before, 
the probability measure induced by the random vector X is defined by 


Py (B) = P{w: X (w) € B}, Be #(R"). 


The distribution function of X is the function F = Fy from R” to [0, 1] 
defined by 


F(x) = Px(—oo, x] = P{w: X;(@) < x, i=1l,..., nh}; 
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F is also called the joint distribution function of X,,...,X,; F is increas- 
ing and right-continuous on R”, and Py is the Lebesgue—Stieltjes measure 
determined by F (see 1.4). 

By 1.2.7, we have 


1 as x; $ CO for all 1; 
P(x;,..., w>fo as x; 4—2 for any particular i (1) 
(with all other coordinates fixed). 


If F is a distribution function on R” that satisfies (1), then F is the distribu- 
tion function of some random vector X, and the underlying probability space 
can be constructed in a canonical way. Take Q = R”, ¥ = B (R"), with P 
the Lebesgue—Stieltjes measure determined by F, and X the identity func- 
tion on Q. Since Py(B) = P{w: X(w) € B} = P(B), X has induced probability 
measure P. Furthermore, the distribution function of X is 


Fx(x) = Px(—œ0, x] = P(—œ, x] 
= lim P(a,x] by 1.2.7 
al—oo 
= lim F(a, x] since P is the Lebesgue- Stieltjes 
41-00 measure determined by F 


F (x) by 1.4.8(b) and the hypothesis 
that F(x) — 0 if any x; | —oo. 


[The hypothesis that F(x) — 1 as x ¢ co implies that P is actually a proba- 
bility measure. ] 

The random vector X is said to be discrete iff the set of values of X is finite 
or countably infinite, or equivalently, iff the component random variables 


X,,...,X, are all discrete. In this case, the properties of X are determined 
by the probability function p, given by p(x) = p{X = x}; explicitly, P{X € B} 
— > xeB p (x). 


The random vector X is said to be absolutely continuous iff there is a 
nonnegative Borel measurable function f on R”, called the density or density 
function of X, such that 


F(x) = f f(t) dt, xeER". 
(—00,x] 
It follows that 
Px (B) = / T(x) dx for all Be BR"). 
B 
For the measure u defined by w(B) = f f(x)dx, BE #(R"), satisfies 


ula, bl) = fap f(x) dx = FG, b] [see 1.4.10(b)]. Thus u is the Lebesgue— 
Stieltjes measure determined by F; hence u = P,. 
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Just as in the previous section, any nonnegative Borel measurable function 
f on R” with f Rn f(x)dx = 1 is the density of some absolutely continuous 
random vector xX. 


4.8 INDEPENDENT RANDOM VARIABLES 

We have talked previously about independence of events; we now consider 
independent random variables. Intuitively, independence of X,,...,X, means 
that a statement about one or more of the X; does not affect the odds concerning 
the remaining X;. Now a statement about X; corresponds to an event of the 
form A; = {X; € B;}; thus the events A,,...,A, will be independent. The 
formal definition is as follows. 


4.8.1 Definition. Let X,,...,X, be random variables on (2, .%, P). Then 
X;,...,X,» are said to be independent iff for all sets B,,...,B, E€ (R), we 
have 

P{X, € By,...,Xy E€ Ba} = P{X,; € B,}---P{X, E By}. 


By 2.6.8(b), independence of X,,...,X, may be expressed by saying that if 
X = (X1, ..., Xn), then Py is the product of the Py,,i = 1,...,n. 


4.8.2 Comments. (a) If X,,...,X, are independent, so are X;,..., X; for 
k < n. To see this, let B,,..., B} € A(R). Then 


P{X, € By,...,X, E€ By} = P{X, © By, ..., Xk E By, Xka E R, ..., Xn E R} 
= P{X E€ Bi} e- P{X, E B}}. 


Thus it is not necessary to check all subfamilies of the collection of events 
{X; € B;}, i= 1,..., n, as in the definition of independent events in 4.3. 

(b) Independence of extended random variables is defined exactly as above, 
with .& (R) replacing .4(R). In fact, suppose that each X; is a random object; 
that is, a map X;: (Q,.¥% ) > (Qi, Fi), where Q; is an arbitrary set and.¥; is a 
o-field of subsets of Q2;. Then the X], ..., Xn are said to be independent iff for 
all sets B E ¥,,..., By, € Fy we have 


P{X, €B,,...,Xn € Ba} = P{X, E€ B} P{X, E Bp}. 


(c) Let X;, ¿€Z (an arbitrary index set) be an arbitrary family of random 
objects. The X; are said to be independent iff X;,,..., X;, are independent for 
each finite set {i;,...,i„} of distinct indices in 7. 

(d) If the X;: (Q,.% ) > (Q;,.4;) are independent random objects, and 
8i (Qi. F ;) > (Qr, Fi), then the random objects g; > X;, i € I, are independent. 
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(“Functions of independent random objects are independent.”) This follows from 
the definition of independence and the fact that 


{g;°X; € Bi} = {X; € g; | (Bj)}. 


Independence of random variables may be characterized in terms of distri- 
bution functions as follows. 


4.8.3 Theorem. Let X,,...,X, be random variables on (Q, Z, P). Let F; 
be the distribution function of X;, i = 1,..., n, and F the distribution function 
of X = (X;,..., Xn). Then X;,..., Xn are independent iff 


F (x1, X2,...,Xn) = FE ax Faxa) e Fn ny) for all real x),..., Xn. 


Proor. If X;,...,X, are independent, then 


F(x, <., Xn) = P{X, <xy,..., Xp < xn} =| | PX; < xj} =|] FiGi). 


i=] i= | 


Conversely, assume F(x;,..., Xn) = [];_, F;(x;) for all x;,..., xn. Then 


Px(a, b] = F(a, b] = | [[F;:(b;) — Fi(@) = | | Px, (a, bi 


i=] i= | 


[see 1.4.10(a)]. Thus 
P{X, €B,,...,X, E€ Ba} = P{X, € Bi} P(X, E Bn} (1) 


when the B; are right-semiclosed intervals of reals. Now fix the intervals 
B2, ..., Ba. The collection @ of sets B; e #(R) for which (1) holds is a 
monotone class including the field of finite disjoint unions of right-semiclosed 
intervals, and therefore & = (R) by the monotone class theorem. Applying 
the same reasoning to each coordinate in turn, we obtain the independence 


of the X;. (Explicitly, we prove by induction that if B,,...,B; are arbitrary 
Borel sets and Bj,;,..., Bn are right-semiclosed intervals, then (1) holds for 
B,,...,B,.) O 


We may also characterize independence in terms of densities. 


4.8.4 Theorem. If X = (X;,...,X,) has a density f, then each X; has a den- 
sity f;. Furthermore, in this case X;,...,X, are independent iff f(x), ..., Xn) 
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= fi (x1) + fan) for all (x1, ..., xn) except possibly for a Borel subset of 
R” with Lebesgue measure zero. 


PROOF. 


Fy (xy) = P{X; < xı} = P{X; < x1, X2 E R, ..., Xn E R} 


xy Oo Oo 
-| J af fti, ...5tn) dt, ++ dtn. 
—OO V — OO = OO 


By definition of absolute continuity, X, has a density given by 


OO OO 
fice) = | af F(X, <.. Xp) AQ +++ dXn 
— OO — 00 


(in other words, we integrate out the unwanted variables). Borel measurability 
of fı follows from Fubini’s theorem. Similarly, each X; has a density fj, 
obtained by integrating out all variables except x;. 

Now if f(x, ...,X%n) = fi@1) ++ Sn (Qn) a.e., then 


xy Xn 
FO. vn) = | af Flt, oes ty) dty «++ din = F(x) ++ Fn (Xp), 
— x — 00 


so by 4.8.3, the X; are independent. Conversely, if the X; are independent, 
then 


F(x), ..., Xn) = Fy (xy) +++ Pn On) 


=) f fiti) fin Cn) dty -++ dta. (1) 
Thus (see the end of 4.7) if g(x), ..., Xn) = fi (x1) fn (Xn), then 
Px(B)= | edz, Be #(R’). 
B 


But Py(B) = f g J (x)dx, and it follows that f = g a.e. (Lebesgue measure) 
by 1.6.11. UO 


4.8.5 Corollary. If X,,...,X, are independent and X; has density f;, 
i=1,..., n, then X has a density f given by f (x1, ..., Xn) = fi (x1) fn (xn). 


Proof. Equation (1) of 4.8.4 applies. O 
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If X,;,...,X, each have a density, it does not follow that (X,,...,X,) 
has a density; thus 4.8.5 is false without the independence hypothesis (see 
Problem 1). 


Problems 


l. Give an example of random variables X and Y (on the same probability 
space) such that X and Y each have densities, but (X, Y) does not. 


2. Give an example to show that even if (X, Y) has a density, it is not 
determined by the individual densities of X and Y. 


3. Let X;,...,X, be discrete random variables. Show that the X; are inde- 
pendent iff 


P{X,=x1,...,Xn = xn}=| | P(X; = xi) for all real Xi, ee, Xn. 


i=} 


4. Let (&,.%,P) be a probability space. The classes %;,i € J, of sets 
in F, are said to be independent iff given any choice of C; € i, i € J, 
the events C; are independent. (Thus the random objects X;: (Q, .¥) 
— (Qr, Fr) are independent iff the classes X; '(¥;') are independent.) 


If the &;, i € J, are independent, show that if the following sets are added 
to each %;, the enlarged classes still remain independent. 


(a) Proper differences A — B, A, B € 6, BCA. 
(b) The sets 4, Q. 

(c) Countable disjoint unions of sets in 4;. 

(d) Limits of monotone sequences in %7. 


Give an example to show that finite intersections cannot be added. 

If you are familiar with Zorn’s lemma, show that if the 4; are indepen- 
dent classes, each closed under finite intersection, the minimal o-fields 
over the %; are also independent. 


4.9 Some Examp._es FROM Basic PROBABILITY 

In this section we give a few illustrations to show how some of the compu- 
tations done in elementary probability courses fit in with the present measure- 
theoretic framework. 


4.9.1 Example. Two numbers X and Y are picked at random between 0 
and 1. Assume that X and Y are independent and that each is uniformly distri- 
buted (that is, X and Y have densities fı and fz given by fix) = f2(x) = 1, 
QO <x <1, and 0 elsewhere). Let Z be the product XY, and let us find the 
distribution function of Z. 
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We take Q = R*, F = #(R’), X(x, y) =x, Y (x, y) = y. By 4.8.5, (X, Y) 
must have density f(x, y) = f; (x) f2(y); hence 


P{(X, Y) € B} = Jj f(x, y) dx dy. 
B 


Thus we take our probability measure to be 


P(B) = f J fi@fay)dxdy, Be BPR). 
B 


Now 


F@=PIZ <2) = P(e, y): xy <= || fifa dxdy 
XV=Z 
Since X and Y are between 0 and 1 (with probability 1), F(z) = 1 for z > 1 


and F(z) = 0 for z < 0. Since fi f2(y) is 1 for0 <x < 1,0< y< 1, and 
() elsewhere, 


0 z 1 x 
Figure 4.9.1. 


F(z)(0 <z <1) is the shaded area in Fig. 4.9.1; that is, z+ fi @/x) dx 
= z— Z lnz. Thus Z has a density 


pes, —_ f—In z, O<z< 1, 
fo) =F = to elsewhere. 


Note that although f is unbounded, its integral, namely, F, is always finite. 
4.9.2 Example. Let X, Y, and Z be independent random variables, each 


normally distributed with m = 0, o = 1; that is, X, Y, and Z each have the 
normal density 


l 
xX} = CX —_ n 
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Let W = (X? + Y*+2Z7)'/* (take the positive square root so that W > 0). 
Find the distribution function of W. 

We take Q=R°, F= A(R), X(x, y,z) =x, Yx, y, = y, Zx, y, z) 
= z, and 


P(B) = Jj T(x, y, z) dx dy dz, B € #(R°), 
B 


where 
fœ, y2 = fhe) = galee) 
= (27r)? exp [—5(x° +y? +z’). 
Thus 
F(w) = P{W < w} = P(X? + Y? +Z <w’?} 
If w > 0, 
F(w) = (22)~°/* exp -30 + y? + 2) dx dy dz 


x? y?+2?<w? 


or in spherical coordinates, 
20 T w l 
Fw) = | ao | ap | (2m) 3? exp -3r r* sin ġ dr 
0 0 0 2 
w l 
— omP (2m\(2) | r? exp -3° dr. 
0 


Thus W is absolutely continuous, with density 


2 l 
— w? exp - 0" w > 0, 


fw) = 4 /2n 
0, w < 0. 


4.9.3 Example. Let X),...,X, be independent random variables, each 


with density f and distribution function F; that is, Q = R”, ¥ = 4(R”), 
Xj (x), s., Xn) = Xj, l < j < n, 


P(B) = | fou) Sada Lede, BE BR”). 


Let T} be the kth smallest of the X;; for example, if n = 4, X,(w) = 2, 
X7(@) = 1.4, X3(@) = —7, X4(w) = 8, then Tı(w) = min; X: (w) = X3(@) 
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= —7, T21(@)=X2x@)=14, T3(@)=X\(@)=2, T4(@) = max; X;(@) 
= X,4(w) = 8. [Note that 


P{X; = Xj} = J Fix fajdxj dx; = 0 for i Æj, 


and therefore 


P{X; = X; for at least one i # j} < 4 P{X; = X;} =0. 
iF] 


Thus ties occur with probability 0 and can be ignored. ] 

Find the individual distribution functions of the Tz, and the joint distribution 
function of (T,,...,7,,). 

Now 


P{T, <x} =  P(T, <x,T, =X} by 4.5.2 (1) 


i=] 
and, for example, 


P{T; < x, T; = X1} = P{X; < x, exactly k—1 ofthe random 


variables X>,...,X, are less than X, 
and the remaining n— k random variables 
are greater than Xj}. (2) 


But, using Fubini’s theorem, 


P{X, <x, X2 < X),...,X} < X1, Xx.) > X],..., Xn > Xj} 


X [ [ f 
KH OO vV n= — OO Xp OO v Xk+ =X 


E F (x1) +++ fxn) dx, +--+ dx, 
= | foa AFT a — Faa dx. (3) 


Now by symmetry, (2) is the sum of (E) terms, each of which has the 
same value as (3), since we may select the k — 1 random variables to be less 
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than X; in E) ways. Also, each term in the summation (1) has the same 


value as (2). Thus 


PIT; <x} = J n( 7 E iFa — F(x)" dx 


so that T} is absolutely continuous, with density 


fra)=n (i E : JOFO (l— Fœ E. 


We now find the joint distribution function of 7),...,7,. Let b; < bz 
<--. <b,. Then 
P{T; < bi,..., Tn < ba} 
= n!P{T; <b,...,7, < bp, X1 < X2 << Xp} by symmetry 
= n!P{X; < bi, Xı < X2 < by, X2 < X3 < b3, ..., Xn-1 < Xn < bn} 
b; bz by 
=n! F(x) dx; foddi: | f(Xn)dXn 
— 00 X] Xn—] 


by by 
=| af 2(X], 6.6, Xn) AX], +++ AXy, 
— 00 — OO 


where 
SSO) fn), ty <2 <0 SH, 
Eis ++ Xn) = lo elsewhere. 
Thus (T1, ..., Tn) 1s absolutely continuous with density g. (Note that f r, can 


be found from g (see 4.8.4), but the calculation is not any simpler than the 
direct method we have used above.) 


4.9.4 Example. Let X be an absolutely continuous random variable with 
density f, assumed to be piecewise-continuous. Let D be a Borel subset of R 
such that D includes the range of X, and let g be a Borel measurable function 
from D to R. 

If Y = goX, we wish to find the distribution of Y. [Distribution is a generic 
term; to say that we know the distribution of Y means that we know how to 
calculate P{Y € B} for all Borel sets B. Thus the distribution may be specified 
by giving the induced probability measure Py or the distribution function Fy. 
If Y is absolutely continuous, its density is adequate, and if Y is discrete, the 
probability function suffices. If Y: (Q, F) > (Q’, F) is an arbitrary random 
object, the distribution of Y means the probability measure Py, defined by 
Py(B) = P{Y e B}, Be ¥’.] 

Assume that D is an open interval 7, and g is either strictly increasing 
or strictly decreasing, with inverse h. Assume also that g has a continuous 
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nonzero derivative (hence so does A). We show that Y is absolutely continuous 
with density 
ho) = {LAOI yea 
0 


elsewhere. 
We compute, for y € g(7), 


Fo(y) = P{Y < y} = Plo: g(X(@)) < y} 
= P{X < h(y)} if g is increasing 
= P{X > h(y)} if g is decreasing 


(see Fig. 4.9.2). 


g(x) 


(a) (b) 
Figure 4.9.2, (a) g strictly increasing; (b) g strictly decreasing. 


Thus hey 
| fix) dx if g is increasing, 
Fi(y)= 4° 
Fi(x)dx if g is decreasing. 
h(y) 
Therefore 
dF >(y) fi(h(y)h o) if g is increasing, 
n% fith(y))(-h'(y)) if g is decreasing, 
? Fi AM NH Y) in either case. 


Now Fz is continuous everywhere, and has a piecewise-continuous derivative: 
it follows that F is the integral of its derivative: 


% dF 
Fy) = | 2 


(apply the fundamental theorem of calculus). Thus Y is absolutely continuous 


with density fı (hOPN. 
Examples of this type in which g is more complicated will be considered 


in the problems. 
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Problems 


l. 


Consider Example 4.9.4, but weaken the hypothesis so that the domain 
of g is the union of closed intervals 7,,...,7,,, such that on the interior 
of each J;, g has a continuous nonzero derivative and is either strictly 
increasing or strictly decreasing, with inverse h;. Show that Y is absolutely 
continuous with density 


RO= > fi Olh;' OVI, 


j= 


where fih; (OÐlh; (Y) is interpreted as 0 if y does not belong to the 
domain of h;. 


Let X be an absolutely continuous random variable with density fi Cò 
= x? /64,0 < x < 4; fi (x) = 0 elsewhere. Define a random variable Y by 
Y = min(VX, 2 — ~X). Find the density of Y. 


Let X, Y, and Z be independent random variables, each uniformly distrib- 
uted between 0 and 1. Find the probability that Z? < XY. 


Let X be a random n-vector with density f1, and let Y = gX, where 
g: R” > R” (or g: D — R”, where D is open in R” and P{X e D} = 1). 
Assume g to be one-to-one and continuously differentiable with a nonzero 
Jacobian J, (hence g has a continuously differentiable inverse h). Show 
that Y is absolutely continuous with density 


fi(hQy)) 


— fi th J = 
A= HAMMAM = FT 


Let X and Y be independent random variables, each normally distrib- 
uted with m = 0 and the same o. Define random variables R and © by 
X = Rcos O, Y = Rsin O. Show that R and © are independent, and find 
their density functions (use Problem 4). 


Let X),...,X, be independent random variables, each with density f. 
Let Xo be the number of random variables among X4, ..., Xn that exceed 
the smallest of the X; by more than 2. Find fe Xo dP. (Leave the answer 
in the form of an integral on the real line.) Hint: Express Xo as a sum of 
indicators. 

Let Q = [0, 1],. = Borel sets, P = Lebesgue measure. Show that 
(Q2,.%, P)is auniversal probability space in the sense that if F is any proper 
distribution function on R [“proper’ means that F (oo) = 1, F(—oo) = 0], 
there is a random variable X on (Q, .¥, P) with distribution function F. 
(Define F~'(y) = sup{x: F(x) < y},0<y<1, and take X(w) = F7! (@), 
with X(0) and X(1) arbitrary.) 
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4.10 Expectation 

Let X be a simple random variable on (Q,.¥, P), taking the values x,,..., 
Xn With probabilities p,,..., pn. If the random experiment is repeated inde- 
pendently N times, N very large, X will take the value x; roughly N p; times, 
so the arithmetic average of the values of X in the N observations is roughly 


fi 
SEN Pix +N paxa +--+ N paxa] = Yo prei: 


i=] 


This is a reasonable figure for the average value of X. If X is represented 
as X`; xig, where the B; are disjoint sets in ¥, the average value may be 
expressed as 5°, x;P(B;), which is fo X dP. Since arbitrary random variables 
are ultimately built up from simple ones, it is reasonable to take fẹ X dP as 
the definition of the average value (henceforth to be called the “expectation’”’) 
of X. 


4.10.1 Definition. If X is a random variable on (Q,.¥%, P), the expectation 
of X is defined by 


BX) = | Xap 
Q 


provided the integral exists. Thus E(X) is the integral of the Borel measurable 
function X with respect to the probability measure P, so that all the results 
of integration theory are applicable. The same definition is used if X is an 
extended random variable. 

In many situations it is inconvenient to compute E(X) by integrating over 
Q; the following result expresses E(X} as an integral with respect to the 
induced probability measure Py, which in turn is determined by the distribution 
function F. 

First, a word about notation. If F is a distribution function on R” with cor- 
responding Lebesgue—Stieltjes measure u, and g: (R",.4) — (R, £), then 
fign (x) dF (x) means fga g du; it is not a Riemann-Stieltjes integral. 


4.10.2 Theorem. Let X be a random variable on (Q, .¥, P), with distribution 
function F. Let g be a Borel measurable function from R to R. 
If Y = goX, then 


E(Y) = J g) dF (x) (- J g Py | 
R R 


in the sense that if either of the two sides exists, so does the other, and the 
two sides are equal. 
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Proof. We use the basic technique of starting with indicators and proceeding 
to more complicated functions. 
Let g be an indicator Ig, B € # (IR). Then 


EY) = E(Ip°X) = Ellen) = Px(B) = J gdPx 
so that E(Y) and fg g dPy exist and are equal. 


Now let g be a nonnegative simple function, say, g(x) = >7;_, xjl g, Œœ), the 
B; disjoint sets in # (R). Then 


E(Y) = X Els, oX) = Sx; J Ip, dPx by what we have just proved 
j=! j=l 


= j X xls, dPy since g > 0 
RS 


= [ var. 
R 


Again, both integrals exist and are equal. 
If g is a nonnegative Borel measurable function, let g;, g2, .. .be nonnegative 
simple functions with g, t g. We have just proved that 


E(g,°X) = fe dP x; 


hence by the monotone convergence theorem, 


Elg: X) = | gary 
R 


and again both integrals exist and are equal. 
Finally, if g = g7 — g` is an arbitrary Borel measurable function and 
Y = goX, we have 


E(Y) = E+) — EYT) = E(gt 2X) — E(g™ °X) 


= | g` dPy — J g dPx by what we have already proved 
R R 


= | sary. 
R 


If E(Y) exists and, say, E(Y~ ) is finite, then tr g` dPy is finite, and hence 
Jp gdPx exists; by the same reasoning, the existence of fg gdPy implies that 
of E(Y). O 
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4.10.3 Corollaries and Extensions. (a) Let X be a random vector on 
(QQ, F, P), and let g be a Borel measurable function from R” to R. Then 
E(g°X) = fga g(x) dF (x) in the sense that if either integral exists, so does the 
other, and the two are equal. 

The proof is exactly as in 4.10.2, with R replaced by R”. 

(b) More generally, let X be a random object on (Q2, Z, P), that is, 
X: (Q, FH) > (V, F), where (Q’,.¥’) is an arbitrary measurable space. Let 
g be a Borel measurable real (or extended real) valued function on (Q’, Z^). 
Let Py be the probability measure induced by X: 


Py(B) = P{w: X(@) € B}, BeF. 
Then 
E(geX) = | g dPx 
(>! 


in the sense that if either integral exists, so does the other, and the two are 
equal. 

Again, the proof is just as in 4.10.2, with R replaced by Q’ and .4(R) 
by F. 

(c) If X is arandom variable (or random vector) with density f, then 


J o(x)dF(x) = J o(x) flx)dx 


(integration over R in the case of a random variable, and over R” in the case 
of a random vector) in the sense that if either integral exists, then so does the 
other, and the two are equal. 

When g is an indicator Ig, this says that Py(B) = |, f(x)dx, which holds 
for any Borel set B. The proof is completed by passing in turn to nonnegative 
simple functions, nonnegative Borel measurable functions, and arbitrary Borel 
measurable functions. 

(d) If X is a discrete random variable with probability function p, then 


J g(x) dF (x) = >_ g(x)p(x), 


where the series is interpreted as g7 (x)p(x) — $., 8° (x)p(x), and again 
the interpretation is that the integral exists iff the sum exists (that is, 


Vstopa <o or SY gapa) < 00), 


and in this case the two are equal. 
This is proved by starting with indicators as before. 


Expectations of certain functions of X are of special interest. 
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4.10.4 Definition. Let X be a random variable on (Q,.¥,P). If k> 0, 
the number E(X*) is called the kth moment of X; E[|X|*] is called the kth 
absolute moment of X. E[(X — E(X))*] is called the kth central moment: 
E[|X — E(X)|*] the kth absolute central moment; central moments are defined 
only when E(X) is finite. 

The first moment (k = 1) is E(X), sometimes called the mean of X, and 
the first central moment (if it exists) is always 0. The second central moment 
o? = E[(X — E(X))’] is called the variance of X, sometimes written Var X, 
and the positive square root o the standard deviation. 

Note that E(X*) is finite iff E[|X|*] is finite, by 1.6.4(b). Also, finiteness of 
the kth moments implies finiteness of lower moments, as we now prove. 


4.10.5 Lemma. If k>O and E(X*) is finite, then E(X/) is finite for 
QO<j<k. 


First PROOF. 


euxvy=fixvar= [xyes f xyaP 
G2 {X|} <1} (X| >1} 


< P{|X|/ < n+ IX|kdP < oo. O 
Q2 


SECOND Proof. We have ||X||; < ||X||; for 0 < j < k (8.2.4). O 


Central moments of integral order can be obtained from moments, as fol- 
lows. 


4.10.6 Lemma. If n is a positive integer greater than 1, E(X"~') is finite, 
and E(X”) exists, then E[(X — E(X))"] = S77 CH-E ANEX"). In par- 
ticular, if E(X) is finite [E(X*) always exists since X? > 0], then 


Var X = E (X^) ~ [E QN. 


Proor. Use the binomial theorem and the additivity theorem for integrals 
(1.6.3). O 


A similar formula expresses moments in terms of central moments. (Write 
X” = (X — E(X) + E(X))" and use the binomial theorem.) 
We now restate a result proved earlier in a measure-theoretic context. 
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4.10.7 Chebyshev’s Inequality. (a) If X is a nonnegative random variable, 
O<p<coand0<ée<a@, 


P{X >e}< BAT) 
EP 


2 


(b) If X is a random variable with finite mean m and variance o^, and 


0<k< œ, 
1 
Pi|X — m| > ko} < Fs. 


This is a quantitative result to the effect that a random variable with small 
variance is likely to be close to its mean. 


ProoF, See 2.4.9. LI 
A normally distributed random variable has the useful property that the 


distribution is completely determined by the mean and variance. Specifically, 
if X has the normal density, that is, 


f(x) = 


1 poaa 
exp | —_———_ |, 
V 210 p 20? 


then m = E(X) and o? = Var X; the computation is straightforward, using the 
standard integrals 


OO OO 
l 
J exp(—x”) dx — a/n and J x? exp(—x’) ax = 5 VE. 


—OO — 00 


The phrase “normal (m, 07)” is used for a random variable that is normally 
distributed with mean m and variance o°. 
The following result on the expectation of a product of independent random 


variables is a direct consequence of Fubini’s theorem. 

4.10.8 Theorem. Let X,,...,X, be independent random variables on 
(2,7, P). If all X; are nonnegative or if E(X;) is finite for all i, then E(X, ---X,,) 
exists and equals E(X; )E(X2) E(X,). 

Proof. If all X; > 0, then by 4.10.3(a), 


BOX) Xp) = fares an Penryn where X = (X),°+:,Xn). 
R 
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Since Py is the product of the Py, (see 4.8.1), Fubini’s theorem yields 


EX; Xp) = J Px) | xdPs, n) = E(X})---E(X,). 


(This can also be proved without Fubini’s theorem by starting with indi- 
cators and proceeding to nonnegative simple functions and then nonnegative 
measurable functions, but the present proof is faster. Note also that the result 
holds for extended random variables, with the same proof.) 

If all E(X;) are finite, the above argument shows that 


E(X: + Xa) = | [ EOX: < œ, 
i=] 
and thus Fubini’s theorem may be applied just as in the first part of the 
proof. O 


Theorem 4.10.8 can be extended to complex-valued random variables. Re- 
call from 2.4 that a complex-valued random variable X on (Q, 7; P) is given 
by X = X; + iX where X, = Re X and X, = Im X are (real-valued) random 
variables. In view of the discussion at the beginning of 4.7, we may regard X as 
simply a two-dimensional random vector. We define E(X) = E(X,) + iE(X2) 
provided E(X,) and E(X>) are both finite. 


4.10.9 Theorem. If X),...,X, are independent complex-valued random 
variables and E(X;) is finite for all 1, then E(X; --- Xn) is finite and equals 
E(X) E(Xy). 


Proof. First, let n = 2, X) = Yı + iZi, X2 = Y2 + 1Z>. By 4.8.2(d), Yı and 
Y, are independent, as are Y; and Z2, Z,; and Y>, and Z; and Z>. By 4.10.8, 


E(X X2) = E(Y))E(¥2) — E(Z,)E(Z2) + iE(Y  )E(Z2) + 1E(Z))E(Y2) 
= (E(Y 1) + iE(Z )XE(Y 2) + iE (Z2)) = E(X, )E (X2). 


Now letn > 2, X; = Y; + iZ;, j= 1,..., n, and assume the result has been 
established for n— 1 random variables. If V = (Y1, Z1, Y2,Z2,..., 
Y,—-1,Z,—)) and W = (Y,, Za), we claim that V and W are independent. By 
independence of X),...,X,, P{V € A, W € B} = P{V € A}P{W e B} when 
A and B are measurable rectangles. Pass from measurable rectangles to finite 
disjoint unions of measurable rectangles, and then, by means of the monotone 
class theorem, to arbitrary Borel sets. 

The independence of V and W implies, by 4.8.2(d), that X; ---X,_, and X, are 
independent, so that E(X; ---X,)=E(X1 Xn- DEX) = E(X) E(X) by 
the induction hypothesis. L 
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Theorem 4.10.8 implies that the variance of a sum of independent random 
variables is the sum of the variances. Actually, a somewhat more general result 
may be derived. We first introduce some new terminology. 


4.10.10 Definitions and Comments. Let X and Y be random variables with 
finite expectation, and assume E(XY) is also finite. (In particular, if X and Y 
have finite second moments, E(XY) is finite by the Cauchy—Schwarz inequal- 
ity.) The covariance of X and Y is defined by 

Cov(X, Y) = E[(X — E(X))(Y — E(Y))] = ECXY) — ECX)E(Y). 


If X and Y are independent, then Cov (X, Y) = 0 by 4.10.8; however, the 
converse is not true (consider X = cos@, Y = sin, where @ is uniformly 
distributed between 0 and 27). 
If the variances oy” and oy” are finite and greater than 0, the correlation 
coefficient between X and Y is defined by 
P(X, Y) = (Cov(xX, Y)/oxoy). 


By the Cauchy—Schwarz inequality applied to X — E(X) and Y — E(Y), 
—1 < p(X, Y) < 1. Furthermore (see Problem 6, Section 2.4), | p(X, Y) = 1 
iff X =X — E(X) and Y’=Y-—E(Y) are linearly dependent, that is, 
P{aX’ + bY’ = 0} = 1 for some real numbers a and b, not both 0. 


We now look at the variance of a sum. 


4.10.11 Theorem. 
If X;,...,X, are random variables with finite expectation, and E(X;X ;) is 
finite for all i, j with i Æ j, 


Var(X; + ---+X,)= S Var X; +2 ` Cov(X;, X;). 


i=] i, j=] 
i<j 


Thus if X),...,X, are mutually uncorrelated, that is, Cov (X;, X;) = 0 for 
i Æ j; in particular, if X,,...,X, are independent, then 


i=] 
PROOF. 
Var (X; +-:-+X,) 
= E[(X; +---+X,, — E(X))- e — E(X,))'] 


= X ELK: -EXI +2 $7 ELK — EX) (Xj -EQ O 


i=] i j=l 


izi 
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4.10.12 Corollary. Under the hypothesis of 4.10.11, if a),...,a,, b are ar- 
bitrary real numbers, 


Var(ajX1 + +++ +a,X, +b) = X a? VarX; +2 X aja; Cov(X;, Xj). 
i=] i, j=l 


i<j 


Proor. This follows from 4.10.11, along with the observations, which may 
be verified from the definitions, that Var(aX + b) = a VarX and 
Cov(a;X;, a;X ;) = aja; Cov(X;, X;). O 


Problems 


1. Let X be adiscrete random variable, with P{X =n} = G), n=1,2,.... 
Let Y = g o X, where g(n) = (—1)"+!2"/n. Show that E(Y) does not exist, 
although the series $~”~, g(n)P{X = n} is conditionally convergent. 

2. Let X be a random variable with the distribution function shown in 
Fig. 4.10.1. Compute E(X’). 


Figure 4.10.1. 


3. Suppose X is a random variable with distribution function Fy, and 
Y = goX, g: R — R, Borel measurable. It is desired to find the expecta- 
tion of Y. One student evaluates ine g(x) dF y(x); another first finds the 
distribution function of Y, that is, 


Fy(y) = P{Y < y} = P{X € g! (—o, yl} 


and then evaluates ine y dF y(y). Will the answers be the same? 


4. Show that the random variables X,,...,X, are independent iff for all 
Borel measurable g;: R — R such that g; > 0, we have 


Elgi(X})+++8n(Xn = [[ EX], where — gi(X) = gie Xi. 
i=] 
(1) 


Also show that X;,...,X, are independent iff (1) holds for all Borel 
measurable g;: R — R such that E[g;(X;)] is finite for each i. 
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5. LetX be an extended random variable on (Q, 7, P) and let g be a Borel mea- 
surable function from R to R. Define F(x) = P{—c < X < x}, 
xéER; thus F(co)— F(—oo)=P{X finite} < 1. Show that if E[g > X] exists, 
Elg: X] = f°. g(x) dF (x) + g(00)P{X = 00} + g(—00)P{X = —oo}. 


4.11 InrintTe SEQUENCES OF RANDOM VARIABLES 

Very often we shall be interested in a sequence of random variables X}, 
X>,...,X,, where n is arbitrarily large and not fixed in advance. Thus it is 
convenient to have a single probability space on which we can define an infinite 
sequence of random variables. The discussion of measures on infinite product 
spaces (2.7) provides the necessary machinery. In particular, we can require 
that X1, X>,...be independent, with X; having a specified distribution. In fact 
the X; can be random objects with values in an arbitrary measurable space. 


4.11.1 Theorem. Let (Q;,.¥;,P;), j = 1,2,...be an arbitrary sequence of 
probability spaces. There exists a probability space (Q2,.¥%, P) and a sequence 
of independent random objects X;: (Q, F) > (Q,,.¥;) such that 


P{X; € B} = P;(B) for all Be ¥;,j7=1,2,.... 


To obtain a sequence of independent random variables with specified dis- 
tribution functions Fy, F2,..., we take Q; = R, ¥; =.¥ (R), with P; the 
Lebesgue-—Stieltjes measure corresponding to Fj. 


Proor. Let Q=]; Z=] E FP = JP; (see 2.7.3). If 
w = (w1, @2,...) EQ, let X;(@) =w; ]J=1,2,.... If BeF, then 
{@: X ;(w) € B} = {w: w; € B}, a measurable rectangle. Thus {w: X; (w) € B} 
E€ F, so that the X; are random objects. By 2.7.3, 


P{X) €Ay,...,X, CAn}= [PPA if Ase F, 1Ls<jx<n. 


j=) 


Take A; = Q;, i Æ j, to conclude that P{X; ¢A;}—=P,(A;). Therefore 
P{X; E€ A1, ..., Xn € An} = |D- P{X; € Aj}, proving independence. O 


The results of 2.7 may also be used to provide the underlying probability 
space for a Markov chain. Let S be a finite or countably infinite set, called the 
state space; for convenience we may take S to be a subset of the integers. Let 
II = [p;;], i, 7 € S, be a stochastic matrix, that is, p;; > O for alli, j € $, and 
>; Pij = 1 for alli € S. Let p;, i € S be a set of nonnegative numbers adding 
to 1 (the initial distribution). We envision a process that starts at time ¢ = 0 
in an initial state Xo, where P{Xo = i} = p;, i € S, and makes transitions at 
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times ¢ = 1, 2,...in accordance with the following rule. If X, denotes the 
state at time n, and we know that X,, = i, then regardless of the past history, 
in other words, regardless of how the process happened to arrive at state i, the 
probability of moving to state j at time n + 1 is p;;. We expect from 4.5.1(b) 
that for all ig, i),...,i, E S, n =0,1,2,..., 


P{Xo = Ip, X] = 1),...,Xpn = İn} — Pio Pioi} tte Din vin’ 


We now show that it is possible to construct a sequence of random variables 
satisfying this requirement. 


4.11.2 Theorem. For a given state space S, stochastic matrix II = [p;,], 
i, j E S, and initial distribution p;,i € S, there is a sequence of random vari- 
ables Xo, X;,..., all defined on the same probability space and taking values 
in S, such that 


P{Xo = 10, X] =1),...,Xp = in} = Din Pioi "+*+ Di _vin (1) 
for all iọ,..., ip E S and all n —0O,1,.... 


Proor. Let. consist of all subsets of S, and take Q = S, ¥= 7., Define 


Po(B) = X pi BEJ, 
icB 
P(i9, -s in=1, B) = > prap BEZ gy. yin ES. 
jeB 


Since .7™” consists of all subsets of $”, the measurability requirements of 
Theorem 2.7.2 are automatically satisfied. If we define X,,(wo, @1, ...)} = @n, 
n =0,1,..., we obtain 


P{Xo = io... X, =i} = | Poder) | Pron, dy) 
S S 


a J Talos +++, n) Plos +24 On dan), 
5 


where B = {(@0,..., @n X}: @ =1o,...,@, = in}. Now 1f u is a measure on.” 
with u{j} = qj, j E S, and f: S — R, then fọ fdu = 97, q; fG) (see 2.4.12). 
Thus 


[ Iso, 2. On P(O, <- On-1, Wy) 
5 


—_ I B(w, ..+} Wn, 1, )P(@o, ee 4 Wn}, {in }) 
= I p(@o, ee 4g Wn}, Ln) Dey rin 
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We may continue this process to obtain 
P{Xo =10,..-,Xn = in} = Pio Pipi, ttt Pinaiin’ L 


A sequence of random variables {X,,} satisfying (1) of 4.11.2 is called a 
Markov chain corresponding to the matrix II and the initial distribution { p;}; 
Il is called the transition matrix of the chain and the numbers p;; are called 
transition probabilities. 

The basic properties of Markov chains are discussed by Ash (1970, Chap- 
ters 6 and 7). The symmetric random walk in R*, an important special case, 
is considered in Appendix 1. 

If X,,...,X, are random variables on (Q, 7, P}, the random vector 
X = (X),...,X,) is a Borel measurable map from Q to R”. The same inter- 
pretation is possible for an infinite sequence of random variables, as follows. 


4.11.3 Theorem. Let (Q;,.%;),j =1,2,..., be arbitrary measurable 
spaces, and let p; be the projection of la Q; onto the jth coordinate 
space. If (€2,.¥) is a measurable space and X: Q —> Ie (2, let X; = pj» X, 
j= 1,2,.... Then X is measurable (relative to ¥ and |]72., .¥)) iff each X; 
is measurable (relative to ¥ and ¥;). In particular, if X1, X2,...are random 
variables, then X = (X,,X>,...) is measurable: (Q, Z` —> (R™~, BY), 
4 = 2 (IR). For this reason X is sometimes called a random sequence. The 
same result holds when Q2 is an arbitrary (possibly uncountable) product space. 


Proor. If X is measurable, each X; is a composition of measurable maps 
and is therefore measurable. Conversely, assume each X; to be measurable. 
Let B = {(@), @2,...): @; EA; j= 1,...,n}, the A; € ¥;, be a measurable 
rectangle in [],"_, .%;. Then 


{fw E Q: X(w) € B} = [Jlo €Q: X ;(w) € Aj} EF. 


j= 


The proof for uncountable product spaces is essentially the same, with B 
replaced by a measurable rectangle in I1;.4%;. L 


The proof of 4.11.3 shows also that if X: 2 — | [;—; Q;, then X is mea- 
surable iff each X;, 1 < j < n, is measurable. 

We close the chapter with an introduction to one of the basic limit theorems 
of probability. 


4.11.4 Weak Law of Large Numbers. Let X;, X2, ...be independent random 
variables (not necessarily with the same distribution), each with finite mean 
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and variance. Assume the variances to be uniformly bounded by M < oo. Let 
Sa =X;,;+---4+X,. Then [S, — ECS, )|/n converges in probability to 0, that 
is, given £ > 0, 


r| 


Proor. By Chebyshev’s inequality, 


Sn _ E(S,) 


> el 0 as n— œ. 
n 


P{\(S, — ES,)/n| > £} < 3E 


orl 


1 n 
= 3G X VarXą by 4.10.11 
k=] 


M 
ER 


lA 


There are two special cases of particular interest. 


l. If E(X; =m for all i, then [S, — E(S,)|/n = (S,/n) —m; hence 
S,,/nm —> m in probability. Thus for large n, the arithmetic average of n in- 
dependent random variables, each with finite expectation m (and with the 
variances uniformly bounded) is quite likely to be very close to m. 


2. If X,,X2,... are independent, and for each 7, P{X; = 1} = p, 
P{X; = 0} =q = l — p (thus we have an infinite sequence of Bernoulli 
trials), then X; +- +X, is the number of successes in n trials, hence S,,/n 
is the relative frequency of successes. Since EX; = p, we have S,/n — p 
in probability. Thus for large n, the relative frequency of successes is quite 
likely to be very close to p. 

Intuitively, the weak law of large numbers says the following. If we regard 
observation of X),...,X, aS one performance of an experiment, where n 
is very large but fixed, then if we repeat the experiment independently, 
(Sa — ES,)/n will be close to 0 a very high percentage of the time. 

But physically we expect something more than this. If a coin with probabil- 
ity p of heads is tossed over and over again, we expect the relative frequency 
to approach p in the ordinary sense of convergence of a sequence of real num- 
bers; in other words, given € > 0, eventually the relative frequency S,,/n gets 
and remains within £ of p. Here we are considering observation of the infinite 
sequence X1, X2,...asS one performance of the experiment, and what we must 
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show is that lim,_.o.5,(@)/n = p for almost every œw. A statement of this 
type is called a strong law of large numbers. This subject will be considered 
in detail later. 


Problems 


l. (a) If Y3, Y2,...are independent random objects, show that for each n, 
(Y,,...,¥Y,) and (Y,41, Y,42,...) are independent. 
(b) In part (a), if Y;: (Q,.4%) > (S, for all i, and all Y; have the 
same distribution, show that (Y1, Y2,...) and (Y,, Y,41,...) have 
the same distribution for all n. 

2. Consider the gambler’s ruin problem, that is, the simple random walk with 
absorbing barriers at O and b. In this problem, Y}, Y2,... are indepen- 
dent random variables, with P{Y; = 1} = p, P{Y¥; =-lh=q=1-p. 
Let X, = Xz; Yx, and let x be an arbitrary integer between 1 and b — 1. 
We wish to find the probability A(x) of eventual ruin starting from x, in 
other words, the probability that x + X, will reach 0 before it reaches b. 
Intuitive reasoning based on the theorem of total probability leads to the 
result that 

h(x) = ph(x + 1) + ght — 1). 


Give a formal proof of this result. (For further details, see Ash, 1970, 
Chapter 6.) 


4.12 REFERENCES 

The general outline of this chapter is based on Ash (1970), which is a text 
for an undergraduate course in probability. Measure theory is not used in the 
book, although some of the underlying measure-theoretic ideas are sketched. 
Many additional examples and problems can be found in Feller (1950) and 
Parzen (1960). 

To develop intuitive skills, we recommend The Probability Tutoring Book 
by C. Ash (1993). Another useful reference is Ross (1993). 


CHAPTER 


5 


CONDITIONAL PROBABILITY 
AND EXPECTATION 


5.1 Inrropucrion 

In Chapter 4, we defined the conditional probability P(B|A) only when 
P(A) > 0. However, conditional probabilities given events of probability 
zero are in no sense degenerate cases; they occur naturally in many problems. 
For example, consider the following two-stage random experiment. A random 
variable X is observed, where X has distribution function F. If X takes the 
value x, a random variable Y is observed, where the distribution of Y de- 
pends on x. (For example, if 0 < x < 1, a coin with probability of heads x 
might be tossed independently n times, with Y the resulting number of heads.) 
Thus P(x, B} = P{Y e B|X = x} is prescribed in the statement of the problem, 
although the event {X = x} may have probability zero for all values of x. 

Let us try to construct a model for the above situation. Let Q = RŽ, 
F= B(R*), X(x, y) = x, Y (x, y) = y. Instead of specifying the joint distri- 
bution function of X and Y, we specify the distribution function of X, and 
thus the corresponding probability measure Py; also, for each x we are given a 
probability measure P(x, -) defined on B(R); P(x, B) is interpreted (informally 
for now) as P{Y € B|X = x}. 

We claim that the probability of any event of the form {(X, Y) € C} 
is determined. Reasoning intuitively, the probability that X falls into 
(x, x + dx] is dF (x). Given that this occurs, in other words (roughly), given 
X = x, (X, Y) will lie in C iff Y belongs to the section C(x) = {y: (x, y) € C}. 
The probability of this event is P(x, C(x)). The total probability that (X, Y) 
will belong to C is 


P(C) = J P(x, C(x)) dF(x). (1) 


— 00 
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In the special case C = {(x, y): x E€ A, y € B} =A x B, C(x) = B if x € A and 
C(x) = Ø if x ¢ A; therefore 


P(C) = P(A x B) = | Pa B) dF(x). (2) 
A 


Now if P(x, B) is Borel measurable in x for each fixed B €e .#(IR), then 
by the product measure theorem, there is a unique (probability) measure on 
.@(R*) satisfying (2) for all A, B € (IR), namely, the measure given by (1). 
Thus in the mathematical formulation of the problem, we take the probability 
measure P on F= (R?) to be the unique measure determined by Py and 
the measures P(x,-),x € R. 


3.2 APPLICATIONS 
We apply the results of 5.1 to some typical situations in probability. 


5.2.1 Example. Let X be uniformly distributed between 0 and 1. If X = x, 
a coin with probability x of heads is tossed independently n times. If Y is the 
resulting number of heads, find P{Y = k}, k =0,1,...,7. 

Let us translate this into mathematical terms. Let Q; = [0, 1], 7 = #[0, 1]. 
We have specified Py (A) = f a dx = Lebesgue measure of A, A € Ii. 

For each x, we are given P(x, B), to be interpreted as the conditional proba- 
bility that Y € B, given X = x. We may take Q: = {0,1,...,},.A% the class 
of all subsets of 022; then P(x, {k} = (7)x*(1 —x)"“*, k = 0, 1,..., 7 (this is 
Borel measurable in x). We take Q = Qi x Qo, F = Fi x Fz, P the unique 
probability measure determined by Py and the P(x, -), namely, 


P(C) = i P(x, C(x))dPy (x) = i P(x, C(x))dx. 
0 0 
Now let X(x, y) = x, Y(x, y} = y. Then 


| 
P{Y = k} = P(Q x {k} = J P(x, {k}) dx 


] 
= j (z) a — x)" dx = “ B+ 1,n—-k +1), 
0 


where f(r, 5) = h x’! — x) !dx, r,s > 0, is the beta function. We can 
express B(r,s) as T(r (s)/T (r +5), where T(r) = fo x’ 'e “dx, r > 0, is 


the gamma function. Since T(n + 1) =n!,n =0,1,..., we have 
(z k!(n — k)! l 
py = Whee 1 plod 
PN nF anal n 


In solving a problem of this type, intuitive reasoning serves as a useful 
check on the formal development. Thus, the probability that X falls near 
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x is dx; given that X = x, the probability that k heads will be obtained is 
Ctd — x)"~*, Integrate this from 0 to 1 to obtain the total probability. 
The next example involves an n-stage random experiment. 


5.2.2 Example. Let X, be uniformly distributed between 0 and 1. If X; = x, 
let X> be uniformly distributed between 0 and x,. In general, 1f X; = x,,..., Xx 
= xz, let X;,4; be uniformly distributed between 0 and x(k = 1,...,n — 1). 
Find the expectation of X,. 

Here we have Q;=R, F; = #(R), Q= IT 2j, F = M Fj, 
XXi. Xa) = X=, nn’. 

Set P; = Lebesgue measure on (0, 1), and for each x; € (0, 1), 


1 
P(x,,:) = z (Lebesgue measure on (0, x; )], 


that is, 


I 
Pi, B)= — | dx. 
X1 JBNO.x1) 


In general, for each x;,..., xk € (0, 1), k= 1,...,n — 1, take 
1 
P(x,,...,Xx,-) = — [Lebesgue measure on (0, x;)]. 
Xk 


(We use open intervals to avoid division by zero.) 

Let P be the unique measure on .¥ determined by P, and the P(x), ..., Xg, -). 
We may find the expectation of a Borel measurable function g from R” to R 
by Fubini’s theorem: 


| sar= | Pi(dxı) | Penda) f BC... Xn) 
02 02) (25 QQ, 


x P(x, ...,X,—1, EX, ). 


In the present case we have g(x;,...,X,) = X,(X1,..-,X,) =X». Thus 


l x 4 Xn-2 ł Xai 0 y 
Ex) = | dx, | dx... | diy. | dx, 
0 0 X 0 Xn—2 0 Xn—} 
| 


Xi _ 
=| an X =—=2”., 


This example has an alternative interpretation. Let Y,,..., Y, be indepen- 
dent random variables, each uniformly distributed between 0 and 1. Let Z} 
be the product Y; ---Y},1 < k <n. It turns out (see Problem 2, Section 5.6) 
that (Z,,...,2Z,) has the same distribution as (X,,...,X,); hence E(X,) 
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= E(Z,,). Since the expectation of a product of independent nonnegative ran- 
dom variables is the product of the expectations, E(Z,) = [],_, EY) =2™" 
as before. 


5.2.3 Example. Let X be a discrete random variable, taking on the positive 
integer values 1, 2,... with probabilities p,, po,...(p; > 0, 2, pi = 1). If 
X =n, a nonnegative number Y is selected according to the density f„. Find 
the probability that 1 < X +Y < 3. 

Here we have 


QO, = {1,2,...}, ., = the class of all subsets of Q, 


P\{k} = px, k=1,2,..., 
2 =R, F= PR), P(n, B)= J f(x) de 
B 
X(@ 1, @2) = @;, Y (w, @2) = @). 


The measure determined by P, and the P(n, -) is given by 


P(F) = | Plo, FDP do) FEF x F 
(2) 


— X Pon, F(n))p, [see 4.10.3(d)]. 


n=l 


If F= {(@;,@2): 1 < @ +a < 3}, then F(n)= {@a: 1—n <a: <3- 
n}; hence 


00 3—n 
PF)=P{1<X+¥<3}=) p] faod 


n=} l—n 


2 l 
=p. | fidde+ po | fo(x)dx since f,(x)=0 for x<0O. 
0 0 


Note that if p, > 0, then P{Y € B|X = n} is defined and equals P{X =n, 
Y e By/P{X =n}. But P{X =n,Y e B} = P{(@,, a): @ =n, œ € B} 
= P(n, B)p,. Thus P{Y e B|X = n} = P(n, B), as we would expect intu- 
itively. 

For additional examples, see Ash (1970, Chapter 4). 


5.3 Tare GENERAL CONCEPT OF CONDITIONAL PROBABILITY AND 
EXPECTATION 
We have seen that specification of the distribution of a random variable X, 
together with P(x, B), x real, B € #(R), interpreted intuitively as the con- 
ditional probability that Y € B given X = x, determines a unique probability 
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measure on Z (IR*), so that there is only one reasonable joint distribution of 
X and Y consistent with the given data. However, this somewhat oblique 
approach has not resolved the difficulty of defining conditional probabili- 
ties given events with probability zero. For example, if X is a random ob- 
ject on (Q,.¥, P), that is, X: (Q,.%) > (V, 7’), and B € F, we may ask 
whether it is possible to define in a meaningful way the conditional probability 
P(x, B) = P(B|X = x), even though the event {X = x} may have probability 
zero for some, in fact perhaps for all, x. 

By the discussion in 5.1, if we have a reasonable conditional probability 
P(x, B), it should satisfy 


P(X € A} A B) = [Po B) dPy (x), 
A 


where Py is the probability measure induced by X, namely, 
Py(A) = P{w: X(w) EAL AEF. 
In fact, this requirement determines P(x, B) in the following sense. 
5.3.1 Theorem. Let X: (Q, )— (Q',¥') be a random object on 


(02,.¥,P), and let B be a fixed set in .¥. Then there is a real-valued Borel 
measurable function g on (’,.%’) such that for each A € F”, 


P(X € A} A B) = / e(x)dPy (x). 
A 


Furthermore, if h is another such function then g = h a.e. [Py]. [We define 
P(B|X = x) as g(x); it is essentially unique for a given B.| 


Proor. Let A(A) = P({X € A} B), A €. F'. Then d is a finite measure on 
F, absolutely continuous with respect to Py[Py(A) = 0 implies à (A) = 0]. 
The result follows from the Radon—Nikodym theorem. LI 


Let us verify that the conditional probability we have just introduced coin- 
cides with our intuition in simple cases. 


5.3.2 Examples. (a) Let X take on only countably many values x), x2, ..., 
with 


OO 
p; = P{X = x;} > 0, Spi =. 
I=} 


We claim that 


_ PBX =x} 


i) = P(B|IX = x;) = 5 j= 1,2,.... 
sœ) = PBIX = 1) = pa 2 
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(Since Py is concentrated on the x;, we need not bother to specify g(x) for x 
unequal to any of the x;.) Thus the general definition reduces to the elementary 
definition in the discrete case. To prove this, let Q’ = {x,,x2,...}, with F” 
the collection of all subsets of Q’. If A €. Z” and g is defined as above, 
then 


[ edP = | gda = F gA 
i=] 


[see 4.10.3(d)] 


xi€A X, EA 


= P({X € A} A B). 


Since there is essentially only one g satisfying 
| sary = P(X € A} NB), AEF, 
A 


the g we proposed must be correct. 
(b) Let X and Y be random variables with joint density f 


[29=R, F=B(R*), Xœ, y)=x, Yx, y) = y, 


PA= ff fo. y) dx dy, AEF]. 
A 


Now {X = x} has probability zero for each x, but there is a reasonable 
approach to the conditional probability P{Y € C|X = x}, as follows: 


P{x-h<X<x+h,YeEC} 


P{Y e C|lx—h < X h} = 
' x <A <x+h) Pix -h<X <x+h} 


— Steen Sec FU y)du dy 
en fila) du 


3 


where f,(x) = f°. f(x, y)dy is the density of X. 
For small A, this is (hopefully) approximately 


2h Je fx, y) dy _ F(x, Y) y 
2hf | (x) c fix) 


5.3 THE GENERAL CONCEPT OF CONDITIONAL PROBABILITY 207 


We are led to define 
f(x, y) 


Fi (x) 


as the conditional density of Y, given X = x (or for short, the conditional 
density of Y given X). Note that h is defined only when f(x) 4 0; however, 
if S = {(x, y): fix) = 0}, then P{(X, Y) € S} = 0, since 


paneo [franea lf en]a 
S {x: f1œ@=0} LJ y=-oo 


= j fil) de = 0. 
{x: f1(x)=0} 


Thus we may essentially ignore those (x, y) for which the conditional density 
is not defined. 

We expect that P{Y € C|X = x} = [-..h(y|x)dy. More generally, if B € Z 
and X = x, then B will occur iff Y € B(x). To find P{Y € B(x)|X = x}, we 
integrate h(y|x) over y € B(x). Thus we propose 


h(y|x) = 


g(x) = f h(y|x) dy, BEF, xeR, 
B(x) 


as the conditional probability of B given X = x. To prove this, first note that 


g) = | Ta(x, y)h(y|x) dy: 


—OO 


hence g is Borel measurable by Fubini’s theorem. Also, if A € .#(R), 


P(X € A} A B) = Jj f(x, y) dedy 


(,y)€B 
-| | I a(x, YRO dy la (x) fy (Œ) dx 
— / fi h(y|x) dy dx 
XEÁ yeB(x) 
= | 8d 
= | g0) dP) [see 4.10.3(c)]. 
A 


Therefore g(x) = P(B|X = x). 
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In this example we may look at the formula f(x, y) = f;(@)h(y|x) in two 
ways. If (X, Y) has density f, we have a notion of conditional probability: 
P{Y € C|X = x} = fe h(y|x)dy. On the other hand, suppose that we specify 
that X has density fı, and whenever X = x, we select Y according to the 
density (|x); in other words, we specify P(x, B) = fg h(y|x) dy, B € #(R). 
A unique measure P on (R?) is determined, satisfying, for A, B € (R), 


P{X € A, Y e B} = | PEDAG) 
A 


= || Ah ae dy 


xeA 
yek 


Therefore (X, Y) has density f(x, y) = ff; @)h(y|x). 

Thus we have two points of view. We may regard the conditional density of 
Y given X = x as ultimately derived from the joint density of X and Y. On the 
other hand, we may regard the observation of X and Y as a two-stage random 
experiment, where the distribution of Y at stage 2 depends on the value of X 
at stage 1. The above discussion shows that the assignment of probabilities to 
events involving (X, Y) is the same in either case. 

We may also define conditional densities in higher dimensions. For example, 
if X, Y, Z, W have joint density f, we define (say) the conditional density of 
(Z, W), given (X, Y), as 


fx, Y, Z, w) 
fxy&x, y) 


where fxy(x, y) = fo. [Z FG y, z, w)ddw. If B € (Rt), then, exactly 
as before, 


h(z, w|x, y) = 


P(BIX =x, Y = y) = Ij h(z, w|x, y) dz dw. 


B(x, y) 


This is verified by proving that, for A € .4(R’), 
P(X, Y) €A)NB)= | PBIX = x, Y = y) fxr (x, y) de dy. 
A 


(c) Let (Q, 71) and (Q2,.%) be given, with no probability defined as 
yet. Take Q = Q, x Qo, F=.F| x Fa, X(@, W2), Y(@,, w2) = @. Assume 
that we are given Py, a probability measure on (Q, Fi), and also that we 
are given P(x, B), x € Q,,B € ¥, a probability measure in B for each fixed 
x, and a Borel measurable function of x for each fixed B. (We are specifying 
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the distribution of X and the conditional distribution of Y, given X = x.) By 
the product measure theorem, there is a unique measure P on .¥ such that 


P{XE€A,YEBJ=P(AxB)= f Po. B) dP y (x). 
A 


It follows that P(x, B) is in fact the conditional probability P{Y € B|X = x}. 

We now consider conditional expectation. Let X and Y be random variables 
on (Q,.¥, P), we ask for a reasonable definition of the expectation of Y given 
that X = x, written E(Y|X = x). Intuitively, E(Y|X = x) should reflect the 
long-run average value of Y in a sequence of independent trials when we look 
only at those observations on which {X = x} has occurred. 

If X and Y are discrete and we are given that X = x, the conditional 
probability of an event involving Y is governed by the set of conditional prob- 
abilities p(y|x) = P{Y = y|X = x}. Thus a reasonable figure for E(Y|X = x) 
is >, yp (lx). Similarly, if (X, Y) is absolutely continuous, and h = h(y|x) is 
the conditional density of Y given X = x, we expect that E(Y|X = x) should 
be fe. yh(y|x)dy. What we need is a general framework that includes these 
special cases. 

Let Y be a random variable (or an extended random variable) on (Q, Z, P), 
and let X: (Q, F) > (Q’, ¥’) be a random object. Our general definition of 
conditional probability hinges on a version of the theorem of total probability: 


P(X € A} A B) = | Pax —x)dPy(x), AEF’, BEF. 
A 


There is a closely related “theorem of total expectation,” which may be de- 
veloped intuitively as follows. The probability that X falls near x is dPy (x); 
given that X = x, the average value of Y is what we are looking for, namely, 
E(Y|X = x). It is reasonable to hope that the total expectation may be found 
by adding all the contributions: 


E(Y) = f E(Y|X = x) dPy (x). 
eu 
To develop this further, we replace Y by YIixeaj, where A € ¥ ' If x €A, 
we expect that E (YI iyea|X = x) = E(Y|X =x) since X(w) =x € A implies 


I ixea lw) = 1. If x é A, we expect that E(YIixea lX =x) = 0. Replacing Y 
by YI;xea; in the above version of the theorem of total expectation, we obtain 


E(YIyea)) = | E(VIyxea|X = x) dPy(x) 
ç)” 
OT 
| Y dP = [eo = x) dP (x). 
{X EA} A 


In fact, this requirement essentially determines E(Y|X = x). 
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5.3.3 Theorem. Let Y be an extended random variable on (Q,.¥%, P), and 
X: (Q, F) > (V, 7), a random object. If E(Y) exists, there is a function 
ge: (Q', F) > (R, £) such that for each A € F’, 


| Y dP = / g(x) dPx (x). 
{XEA} A 


(As usual, .4 denotes the class of Borel sets.) Furthermore, if h is another such 
function, then g = h a.e. [Py]. [We define E(Y|X = x) as g(x); it is essentially 
unique for a given Y.] 


ProoF. Let 
ma) = | yap = | Y dP, AEF. 
{XEA} XLA) 


Then A is a countably additive set function on 7” by 1.6.1, and is absolutely 
continuous with respect to Py since Py(A) = P{X € A}. The result follows 
from the Radon—Nikodym theorem. LI 


Conditional expectation includes conditional probability as a special case, 
as we now prove. 


5.3.4 Corollary. If X is a random object on (Q, 7, P) and B € F, then 
E(Ig|X =x) = P(B|X =x) a.e. [Py]. 


Proor. In 5.3.3, set Y = Ig; the defining equation for conditional expectation 
becomes 


P(X € A} AN B) = J Edalx = x) dPy (x). 
A 


The result now follows from 5.3.1. O 


Let us compare the general definition with the intuitive concept in special 
cases. 


5.3.5 Examples. (a) Let X take on only countably many values x,, x2,... 
(assume all P{X = x;} > 0). We have seen that 


— PBX = x}) 


P(B|X = x;) = BEF. 
B i) P{X = x;} j 
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Thus we should expect that 


1 
E(Ig|X = x) = —————— Ip aP. 
(Isl Xi) PIX =x) Jan B 


Proceeding from indicators to nonnegative simple functions to nonnegative 


measurable functions to arbitrary measurable functions, we should like to 
believe that if E(Y) exists, 


i 
E(Y|X = x;) = ————— Y aP, f= 1,2,.... li 
E e My, 0 


We are not proving anything here since we do not yet know, for example, 


that 
E © Y,|X 
i=] 


To establish (1), let 


— ‘ — ŞEYX — x). 


i=] 


1 
i — Y dP, = ł,2,.... 
80) = Bax) Jani 


(We may assume Q’ = {x,,x2,...}, with Z” the class of all subsets of Q.) 
Then 


i 
YdP= P{X = x;}——__ Y dP 
i 2, P{X = x;} 


GEA {X=x;} 


= SPX = xi}g (i) = / g(x)dP,(x), AEF, 
x;EA A 


as desired. 

In the special case when Y is discrete, (1) assumes a simpler form. If Y 
takes on the values yi, y2,..., we obtain (using countable additivity of the 
integral) 


P{X =x, Y = y;} 
PUK =) = Dy pray 
j hd | 


= ` y;P{Y = y;|X = x;}. (2) 


J 
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(b) Let B e.Z, and assume P(B) > 0. If E(Y) exists, we define the con- 
ditional expectation of Y given B, as follows. Let X = Íg, and set E(Y|B) 
= E(Y|X = 1). This is a special case of (a); we obtain [see (1)] 


i 


in other words, 
E (YI) 


E(Y|B) = P(B) 


(3) 


(c) Let X and Y be random variables having a joint density f, and let 
h = h(y|x) be the conditional density of Y given X. We claim that if E(Y) 


exists, 
OO 


E(Y|X =x) = J yh(y|x)dy. (4) 


To prove this, note that 


/ Y dP = Jj yf (x, y)dedy 
{XEA} 
{(x,y): xEA} 


= | fi () f. yn) dy ax by Fubini’s theorem 
xEA —0O 
= J | / yay) dy dP x(x), 
proving (4). 


Notice also that if q is a Borel measurable function from R to R and E[q(Y)] 
exists, then 


E(q(Y)IX =x) = / g(yyh(y|x) dy (5) 


by the same argument as above. Similarly, if X and Y are discrete [see (a), 
(2)] and E[g(Y)] exists, then 


E(q(¥)IX =x) = X q(yj)P{Y = yX = xj}. (6) 


J 
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(d) Let (Q, i) and (Q2,.%2) be given, with no probability defined as 
yet. Let Q = Qi x Q2, F = Fi x Fo,X(x, y) =x, Y (x, y) = y. Assume that 
a probability measure Py on .¥; is given, and also that we are given P(x, B), 
x€Q,,B €.¥2, a probability measure in B for each fixed x, and a Borel 
measurable function of x for each fixed B. Let P be the unique measure on .¥ 
determined by Py and the P(x, -). 

If f: (Q22, F2) > (R, 2 (IR)) and E[ f(Y)] exists, we claim that 


E(f(Y)IX =x) = f FOP, dy). (7) 


To see this, we note, with the aid of Fubini’s theorem, that 


JPAP = | fea dP 


{XEA} 


- f J FO, ya PE, dy) dPx (x) 


= | | | FT (y)P(, dy) dP (x). 
A LJ 


Problems 


1. Let X and Y be random variables with joint density f(x, y). Indicate how 
to compute the following quantities. 

(a) E(g(X)|Y = y), where g is a Borel measurable function from R to R 
such that E[g(X)] exists; 

(b) E(Y|A), where A = {X e B}, % e (R); 

(c) E(X|A), where A = {X + Y € B}, B e (R’). 

2. Let X be a random variable with density fọo(à). If X =à, n indepen- 
dent observations X;,..., X, are taken, where each X; has density f, (x). 
Indicate how to compute the conditional expectation of g(X), given 
Xi = X],..., Xn =X. 

3. Let X be a discrete random variable; 1f X = x, let Y have a conditional 
density h(y|x). Show that 


oy PAX = x}a(ylx) 
Pa =x = yb = SV P{X = x'JA(y|x’) 
x 
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4. Let X be an absolutely continuous random variable. If X = x, let Y be 
discrete, with P{Y = y|X = x} = p(y|x) specified. Show that there is a 
conditional density of X given Y, namely, 


rayne 


where . 
py(y) = PLY = y} = / Fx) p(y|x) de. 


5. (a) Let X be a discrete random variable: If X = à, n independent ob- 
servations X,,...,X, are taken, where each X; has density f,(x). 
Indicate how to compute E(g(X)|X,; = x,,...,X, =Xy). 

(b) Let X be an absolutely continuous random variable. If X = A, n inde- 
pendent observations X,,..., X,, are taken, where each X; is discrete 
with probability function p, (x). Indicate how to compute E'(g(X)|X | 
= X],.-.> Xp = Xp). 

6. If X is a random vector with density f, and A = {X € Bo}, Bo € A(R"), 
show that there is a conditional density for X given A, namely, 


f(x) | 
fA) = ¢ P(A) $ Eo, 


0 if X ¢ Bo. 


The interpretation of the conditional density is that 


P{X € BJA} = J fœlA)dx, Be BR”). 
B 


7. Let By, Bo, ... be mutually exclusive and exhaustive events with strictly 
positive probability. Establish the following version of the theorem of 
total expectation: If E(X} exists, then 


E(X) = > PB, )E(X|B,). 
n=l 


8. Let X and Y be nonnegative random variables, such that (X, Y) has in- 
duced probability measure 


Pyy = 4P, + 4P3, 
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where, for B € B ([0, ©) x [0, c©)), P; = point mass at (1, 2), that is, 


_ fi if (1,2)eB, 
PB =f; if (1,2)¢B, 


P> (B) = / J ee” dx dy. 
B 


Thus with probability 4,(X,¥)= (1,2); with probability 3,X and 
Y are chosen independently, each with density e*,x > 0. Calculate 
P{Y € B|X = x}, x > 0, B e #[0, œ). (Hint: think like a statistician; if you 
observe that X = 1, it is a moral certainty that you are operating under P4; if 
X Æ 1, it is an absolute certainty that you are operating under P3.) 


and 


5.4 CONDITIONAL EXPECTATION GIVEN A 0-FIELD 

It will be very convenient to regard conditional expectations as functions 
defined on the sample space Q. Let us first recall the main result of the 
previous section. 

If Y is an extended random variable on (Q, .¥, P) whose expectation exists, 
and X: (Q, ZF) > (Q', F’) is a random object, then g(x) = E(Y|X = x) is 
characterized as the a.e. [Px] unique function: (2',.¥’) > (R, #) satisfying 


J yap = | EWIX = x)dP,() A EF. (1) 
(XEA} A 
Now let h(w) = g(X (w)); then h: (Q, F ) > (R, #) [see Fig. 5.4.1]. 


(Q,.7)A+ 7) Ss R, D) 


h 
Figure 5.4.1 


Thus (œw) is the conditional expectation of Y, given that X takes the value 
x = X (æ), consequently, h measures the average value of Y given X, but A is 
defined on © rather than Q. 

It will be useful to have an analog on (1) for h. We claim that 


| hdP = J Y dP for each A € F”. (2) 
{XeA} {XA} 
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To prove this, note that 
J hdP = f gX (o) X @)) dP(w) 
{X EA} Q2 
= | seitacyaPr(®) [by 410.30) 
= f g(x) dP (x) 
=j YdP [by 0). 
{X EA} 


Since {X € A} = X! (A) = {w € Q: X(@) € A}, we may express (2) as fol- 
lows: 


[nap = | Yar for each C € X7'(#’), (3) 
C C 


where X~!(¥") = {X7!(A): A €.F’}. 
The o-field X~! (7) will be very important for us, and we shall look at 
some of its properties before proceeding. 


5.4.1 Definition. Let X: (Q, Z )—> (V, 7’) be a random object. The o- 
field induced by X is given by 


o(X) = X (F'). 


Thus a set in o(X) is of the form {X € A} for some A € 7”. In particu- 
lar, if X = (X,,...,X,,), a random vector, o(X) consists of all sets {X € B}, 
B e€ #(R"). 


The induced o-field has the following properties. 


5.4.2 Theorem. Let X: (Q, F) > (V, F’). 

(a) The induced o-field o(X) is the smallest o-field & of subsets of Q 
making X measurable relative to ¥ and F”. 

b If =N, =N; so that X= (X;,X2,...), where X;: 
(2, F )— (Q;,.4;) is the jth coordinate of X, then o(X) is the smallest 
o-field & of subsets of Q making each X; measurable relative to ¥ and .¥;. 

(c) If Z: (Q,0(X)) > (Ri.¥) (or (R,.#%)), then Z = feX for some f: 
(Q', F’) > (R,.8). Conversely, if Z = foX and f: (XV, ¥’) > (R, 2), 
then Z: (Q, o({X)) > (R, 2). 
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Proor. (a) If Ae’, then X~'(A) € o(X) by definition of o(X); hence 
o(X) makes X measurable. If & is any o-field making X measurable, X7! (A) € 
Y for all A € F’; hence o(X) C F. 

(b) By 4.11.3, X is measurable relative to & and ¥" iff each X ; 1S mea- 
surable relative to 7 and .¥;. The result follows from (a). 

(c) Assume Z: (Q, o(X)) > (R, 2). If Z is an indicator Ic, C € o(X), 
then C = X-T! (A) for some A € F”. If f = l,, then f- X = Í Xea} =ç = Ż. 
If Z = `, Ic is a finite-valued simple function and Ic, = f; °X as above, 
then Z = f- X, where f = Xz- Zk fk. 

In general, let Z,,Z2,... be finite-valued simple functions such that 
Z, — Z. We can express Z, = f,°X as above; define f = lim,-... fnan where 
the limit exists, and 0 elsewhere. Then 


Z(w) = lim Z, (w) = Lim fn(X(@)) = f (X(@)). 


The converse holds because a composition of measurable functions is mea- 
surable. Ll 


Note that 5.4.2(b) holds equally well for uncountable product spaces, be- 
cause, as we observed at the time, 4.11.3 extends to arbitrary products. For 
an extension of 5.4.2(c), see Problem 1. 

Now let us return to Eq. (3) at the beginning of this section: 


[nap = | var, C € oX), 
C C 


where h = g ° X, g(x) = E(Y|X =x). Since g: (Q’, ¥’) > (R, #), we have 
h: (Q, o(X)) > (R, £) by 5.4.2(c). This fact, along with (3), characterizes 
h, and gives us the concept of conditional expectation given a o-field. 


5.4.3 Theorem. Let Y be an extended random variable on (Q, F, P), F a 


sub o-field of Z. Assume that E(Y) exists. Then there is a function E(Y|*): 
(2, F) > (R, £), called the conditional expectation of Y given ¥, such that 


| var= | eu yar for each C € F. 
C C 


Any two such functions must coincide a.e. [P]. [Note that we cannot simply 
set E(Y |F) = Y, as E(Y |: ) is required to be measurable relative to ¥.] 
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Proor. Let A(C) = fe Y dP, C e ¥. Then A is a countably additive set func- 
tion on &, absolutely continuous with respect to P; the result follows from 
the Radon-Nikodym theorem. L] 


5.4.4 Comment. If g(x) = E(Y|X = x) and h(w) = g(X(o)), then by 5.4.3, 
h= E(Y|*¥), where Z = o(X); for convenience we shall usually write 


h = E(Y JX). 


[For example, if E(Y|X = x) = x’, then E(Y|X) = X? ] 

We have seen that the conditional expectation g(x) = E(Y|X = x), x € X, 
can be transferred to Q by forming A(w) = g(X(@)). Conversely, any condition 
expectational E(Y |F), & an arbitrary sub o-field of .7, arises from a random 
object X in this way. Simply take X: (Q, F) — (Q, &) to be the identity 
map: X(@) =w, w E Q. Then X (Z) = Z, so if g(x) = E(Y|X = x), then 
h = E(Y|o(X)) = E(Y |'¥). 

Now intuitively, E(Y|&) = E(Y|X) is the average value of Y, given that 
X is known. But what does it mean to “know” X: (Q,.F ) > (Q, $)? The 
events involving X are sets of the form {X € G}, Ge &, and since X is the 
identity map, {X € G} = G. Since an event corresponds to a question that has 
a yes or no answer, E(Y|&) may be interpreted as the average value of Y(w), 
given that we know, for each G € &, whether or not w € G. Some examples 
may help to make this clear. 


5.4.5 Example. (a) Let X be discrete, with values x,,x2,...; take 
Q = {x1,%),...}, with Z’ the class of all subsets of Q, and assume 
P{X = x;} > 0 for all i. We have seen in 5.3.5(a) that 


1 
g(x) = E(Y|X = x) = — Y dP. 
P{X = X} {X=x,} 


Let h = E(Y|X), that is, h(w) = g(X(@)). Then A has the constant value g(x;) 
on the set {X = x}, and Z = X~'(¥") consists of all unions of the sets 
{X = x;}. Knowledge of the value of X(w) is equivalent to knowledge, for 
each G € &, of the membership or nonmembership of w € G. 

(b) Let X and Y be random variables with a joint densi- 
ty f. Let Q =R, Z= BR), P(B) = ff, fx, y)dedy, B €. F,X (x, y) 
= x, Y (x, y) = y. Take Q’ = R,. Z” =.4(R). We have seen in 5.3.5(c) that 
g(x) = E(Y|X = x) = ine yho(y|x) dy, where ho is the conditional density 
of Y given X. Let h = E(Y|X), that is, h(@) = g(X(@)) or h(x, y) = g(x). 
Thus E(Y|X) is constant on vertical strips; also, X~'(¥’) consists of all sets 
Bx R,Be- A(R). Since x € B iff (x, y) € Bx R, information about X(@) 
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[obtained intuitively by asking questions of the form “does X(w) belong to 
B ?’] is equivalent to information about membership of w in sets of &. 
We also have a general concept of conditional probability given a o-field. 


5.4.6 Theorem, Let (Q,.%,P) be a probability space, F a sub o-field of 
F; fix Be ¥. There is a function P(B|Y): (Q, ¥) > (R, ¥), called the 
conditional probability of B given & such that 


P(CNB)= | P(B|& ) dP for each CE &. 
C 


Any two such functions must coincide a.e. [P], and in fact 


PBIZ) =E Z) ae. [P]. 


Proof. Let A(C) = P(C MB), C € ¥. Then à is a countably additive finite- 
valued set function on &, absolutely continuous with respect to P; the exis- 
tence and a.e. [P] uniqueness of P(B|& ) follow from the Radon—Nikodym 
theorem. (Since à is finite, the range of P(B|&) may be taken as R rather 
than R.) The connection between conditional probability and conditional ex- 
pectation follows from 5.4.3 with Y = Ig. O 


5.4.7 Comment. If g(x) = P(B|IX =x), X a random object, then g(x) 
= E(Ip|X = x) ae. [Py] by 5.3.4. If h() = g(X(m)), then h = E (Ig|o(X)) 
by 5.4.4, hence h = P(Blo(X)) by 5.4.6. To summarize: If g(x) = P(B|X = x) 
and h(w) = g(X(q@)), then 


h=P(B\F) F =o(X). 
For convenience we shall usually write 


h = P(BIX). 


Problems 


1. Let X: (Q, F )> (V, ¥") and Z: (Q, F(X)) > (Q", F”). We investi- 
gate conditions under which Z = f o X for some f: (Q', F) > (Q", Z”). 
By 5.4.2(c), such an f can be found if Q” = R, F” = B(R). 

(a) Assume that Z” separates points; in other words, if a,b € Q”, 
a Æ b, there are disjoint sets A, B € ¥” such that a € A, b € B (this 
will always hold if ¥” contains all singletons). Show that there 
is a function f: QR’ — Q” (not necessarily measurable) such that 
Z= f oX. 

(b) Assume that Z = f-X, where f: Q — Q". If X(Q) € Z’ and 
f (Q' — X(Q)) consists of a single point, show that f is measurable 
relative to 7” and F”. 
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Thus by (a) and (b), a measurable f such that Z = f -X can be 
found if Z” separates points and X(Q) € F”. 


5.5 PROPERTIES OF CONDITIONAL EXPECTATION 

The conditional expectation E(Y|X = x) is a more intuitive object than 
the conditional expectation E(Y |F ); however, the intuition cannot easily be 
pushed beyond the case in which X is a finite-dimensional random vector. 
Thus in formal arguments in which X is an arbitrary o-field, we are forced to 
use E(Y|&). 

For convenience, we develop the basic properties of conditional expecta- 
tion in pairs, one argument for E(Y|&) and another (usually very similar) 
for E(Y|X = x). Theorems about conditional probabilities are obtained by 
replacing Y by Ig, and results concerning E(Y|X) are obtained by setting 
G= F(X). 

In the discussion to follow, Y, Y,;, Yo, ... are extended random variables 
on (Q, Z, P), with all expectations assumed to exist; X: (Q, Z) > (V, F’) 
is a random object, and & is a sub o-field of 7. The phrase “a.e.” with no 
measure specified will always mean a.e. [P]. If Z: (Q, £ ) > (R, £), we 
say that Z is ¥-measurable, and if g: (V, Z) > (R, Z), we say that g is 
¥'-measurable. 


5.5.1 Theorem. If Y is a constant k a.e., then 
(a) E(Y|Y)=kae. 
(a) E(Y|X = x) = k ae. [Py]. 
If Y, < Y, a.e., then 
b) EYI) < EYF ) ae. 
b) E(Y|X = x) < E(Y2|X = x) ae. [Py]. 


[A statement such as E(Y |f ) < E(Y2|) a.e. means that if Z; is a version 
of E(Y;|& ), in other words, Z; satisfies the defining requirement 5.4.3, then 
Z, < Z a.e. ] 


(c) EYF) < EUY| |) ae. 
(c) |E(Y|X = x)| < E(IY| |X = x) ae. [Px]. 


Proof. (a) The function constant at k is -measurable and 


[var = | kar. CEF., 
C C 
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(a) If g(x) =k,x € Q, then g is ¥’-measurable and 


J Y dP = | kax 
{XeA} A 


(b) J. YidP < fe Y2dP; hence 
[emivyar < | EOF )dP for each C € ¥. 
C C 


The result follows from 1.6.11. 
(b’) Jixea Y, dP < Sixes} Y> dP; hence 


JETIX =xdPx < | ETX =x)dPx for each A € F”. 
A A 
The result follows from 1.6.11. 

Parts (c) and (c’) follow from (b) and (b’), along with the observation that 
-|Y| < Y < |Y]. O 

We now prove an additivity theorem for conditional expectations. 
5.5.2 Theorem. (a) Ifa,b € R, and aE(Y,)+ bE(Y2)is well defined (not 
of the form co — oo), then E(aY,; + bY2|%) = aE(Y,|)4+ dDE(Y2|7) ae. 

(a^) If a,b € R and aE(Y,)+ bE(Y2) is well defined, then 

E(aY,4+ bY 2|X = x) = aE (VY, |X =x)4+ DE(Y2|X = x) ae. [Py]. 


Proor. (a) If CE Y, 


fa; + bY,)dP = | aY, dP + | bY, dP 
C C C 
by the additivity theorem for integrals 
= | aE(Y | )dP + | bE(Y2|¢ )dP 
C C 
by definition of conditional expectation. 


Thus fo aE(Yı| F )dP + [.bE(Y2|#)dP is well defined, so again by the 
additivity theorem for integrals, 


fax, + bY2) dP = f EYZ) + ETSN dP 
C C 


as desired. 
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Thus by (a) and (b), a measurable f such that Z = foX can be 
found if Z” separates points and X(Q) €.F¥°. 


5.5 PROPERTIES OF CONDITIONAL EXPECTATION 

The conditional expectation E(Y|X = x) is a more intuitive object than 
the conditional expectation E(Y|& ); however, the intuition cannot easily be 
pushed beyond the case in which X is a finite-dimensional random vector. 
Thus in formal arguments in which ¥ is an arbitrary o-field, we are forced to 
use E(Y|&). 

For convenience, we develop the basic properties of conditional expecta- 
tion in pairs, one argument for E(Y|&) and another (usually very similar) 
for E(Y|X =x). Theorems about conditional probabilities are obtained by 
replacing Y by g, and results concerning E(Y|X) are obtained by setting 
= F (X). 

In the discussion to follow, Y, Y,;, Y2,... are extended random variables 
on (Q, .F, P), with all expectations assumed to exist; X: (Q, Z) > (V, 7’) 
is a random object, and ¥ is a sub o-field of 7. The phrase “a.e.” with no 
measure specified will always mean a.e. [P]. If Z: (Q,#%)— (R, 2), we 
say that Z is ¥measurable, and if g: (Q',. Z’) > (R, Z), we say that g is 
F -measurable. 


5.5.1 Theorem. If Y is a constant k a.e., then 
(a) EYF )=ka.e. 
(a) E(Y|X =x) = k ae. [Py]. 
If Y; < Y> a.e., then 
b) EYF) < E(Y2|¥) ae. 
(b) E(Y|X = x) < E(Y2|X = x) ae. [Py]. 


[A statement such as E(Y |f ) < E(Y2|F ) a.e. means that if Z; is a version 
of E(Y,;|), in other words, Z; satisfies the defining requirement 5.4.3, then 
LZ < Z> a.e. | 


(c) JES Y < EYI |Z) ae. 
(c) [E(Y|X = x)| < E(IY| IX = x) a.e. [Px]. 


Proor. (a) The function constant at k is -measurable and 


[ var = | kar, CEF., 
C C 
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(a) If g(x) =k, x € Q, then g is ¥’-measurable and 


J Y dP = f kax. 
{X EA} A 


(b) feYıdP < J.Y2dP; hence 
[eauyyar < | EYAF)aP for each C € &. 
C C 


The result follows from 1.6.11. 
(b‘) Jixea Y,dP < fixes) Y> dP; hence 


[Bmx =xaPx < f EX =04Px for each A € 7”. 
A A 
The result follows from 1.6.11. 

Parts (c) and (c’) follow from (b) and (b’), along with the observation that 
-|Y| < Y < |Y|. U 

We now prove an additivity theorem for conditional expectations. 
5.5.2 Theorem. (a) Ifa,b € R, and aE(Y,)+bE(Y2) 1s well defined (not 
of the form co — oo), then E(aY, + bY |F) = aE(Y, |£ ) + bDE(Y2|#) ae. 

(a) If a,b € R and aE(Y,) + bE(Y2) is well defined, then 

E(aY, + bY2|X = x) = aE (Y |X = x) + bE (Y2|X = x) ae. [Px]. 


Proor. (a) If CE &, 


fa; + bY ,)dP = | aY,dP + | bY, dP 
C C C 
by the additivity theorem for integrals 
= | aE(Y |F )dP + | bE(Y2|¥ ) dP 
C C 
by definition of conditional expectation. 


Thus fe aE(Yı| F )dP + {[.bE(Y2|@)dP is well defined, so again by the 
additivity theorem for integrals, 


fa; + bY>) dP = f EYZ) HETAZ N dP, 
C C 


as desired. 
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(a) This is done as in (a), with C replaced by {X € A} and 
C A 


In the future, we shall dispose of proofs of this type with a phrase such as 
“same as (a).” L 


The monotone convergence theorem and the fact that a nonnegative series 
can be integrated term by term have exact analogs for conditional expectations. 


5.5.3 Theorem. If Y, > 0 for all n and Y, + Y a.e., then 
(a) EY, |F) TEYS ) ae. 
(a) E(Y,|X = x) t E(Y|X = x) ae. [Px]. 

If all Y, > 0, then 


(b) E (Srs) =X E(Y,|F) a.e. 


n=] 


3 
li 


b) E bs Y,,|X = ‘ =) E(¥,|X = x)ae. [Px]. 


n=l 


= 
li 


In particular, if B,, B2, ...are disjoint sets in .F, 


(c) P Ù 4 = N P(B, F) a.e. 
n=] 


n=] 


(c) P Ù B, |X = r) = X P(B, |X = x) ae. [Py]. 
n=] 


n=] 


Proor. (a) f.Y¥,dP= J. E(Yn|%)dP, C E Z, by 5.5.1(b), the E(Y,|¥) 
increase to a ~-measurable function h. By the monotone convergence theorem, 
Je- Y dP = J. hdP; hence h = E(Y|#7) ae. 

(a) Same as (a). 

(b) By 5.5.2(a), EO) Y|) = Sy, E(Y;| F ) ae. Let n > œ and 
apply part (a) to obtain the desired result. 

(b^) Same as (b). 

Finally, (c) is a special case of (b), and (c^) of (b^. UO 
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If we take the expectation of a conditional expectation, the result is the 
same as if we were to take the expectation directly. This is actually a special 
case of the defining equations of Theorems 5.4.3 and 5.3.3. 


5.5.4 Theorem. (a) E[E(Y|&)] = E(Y); hence if Y is integrable, so is 
E(Y|#). 


(a’) f E(Y|X = x)dPxy(x) = E (Y). 
o 


Proor. (a) fo YdP = f,E(Y|¥)aP. 


(a’) | eax =naPx@) = | yap = | Yap. q 
Q’ {XEN} Q 


We now prove the dominated convergence theorem for conditional expec- 
tations. 


5.5.5 Theorem. If |Y,,| < Z for all n, with E(Z) finite, and Y,, —> Y a.e., then 


(a) EY, |F) > E(Y|#) ae. 
(a) E(Y,|X =x) —> E(Y|X = x) a.e. [Py]. 


Proof. (a) Let Z, = sup,., |Yk — Y|; then Z, | 0 a.e. Now E(Y„,| 7 ) and 
E(Y|#) are ae. finite by 5.5.4(a), and |E(Y,|¥) —E(Y|¥)| < EUY, — Y| 
| ] by 5.5.1(c) and 5.5.2(a); this is less than or equal to E(Z,| 7 ) by 5.5.1 (b). 
Thus it suffices to show that E(Z,|&) > 0 a.e. By 5.5.1(b), E (Z, |: ) | hae. 
for some -measurable function h. Since 0 < Z, < 2Z, which is integrable, 
we have 


0< | hdP< | E@,|7)dP = | Z, dP > 0 
Q2 Q2 02 


by the dominated convergence theorem. Thus = 0 a.e., as desired. 
(a^) Same as (a). O 


The extended monotone convergence theorem and Fatou’s lemma may be 
proved for conditional expectations, as follows. 


5.5.6 Theorem. Assume Y, > Z for all n, where E(Z) > ~œ. 


(a) If Y, t Y a.e., then E(Y,|&) t E(|#) ae. 

(a) If Y, t Y ae., then E(Y,|X =x) t E(Y|X = x) ae. [Px]. 

(b) liminf,.~E(Y,|&) > E(lim infr Y| Z ) ae. 

(b^) liminf, >æ E(Y,|X = x) > E(lim inf, Yp |X = x) a.e. [Px]. 
Now assume Y, < Z for all n, where E(Z) < +o. 
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(c) IfY, Y ae., then EY, Z) | E(Y|%) ae. 

(c) If Y, | Y ae., then E(Y,|X =x) 4 E(Y|X = x)a.e. [Py]. 

(d) limsup, ..,E(Y,|&%) < E(lim sup, _,.. Yn|& ) ae. 

(d) lim sup, sup E(Y, |X =x) < E(limsup, o Y, |X = x) ae. [Px]. 


Proor. (a) If Ce ¥ then fe YrdP = fo E(Y,|%)dP, and E(Y,,|%) in- 
creases to a limit h a.e. By the extended monotone convergence theorem, 
Je Y dP = fo hdP, and therefore h = E(Y|¥) ae. 

(b) Let Y, = infy-, Yz, we then have Y,’ + Y'= lim inf, _..0 Yn. By (a), 
E(Y,'|¥)t E(Y'|#) ae. But Y, < Y,, so 


E(Y IF) = lim E(Y, 17) = lim inf EY, |F ) 
< lim inf E(Y,|¥ ) by 5.5.1(b). 


(c) This follows from (a) upon replacing all extended random variables 
by their negatives. 


(d) E(limsup Y,| F ) = -E (dim inf(—Y,,)| 7 ) 
> —liminfE(—Y,| 7) by (b) 


= lim sup E (Y, | F ). 


Hn > 00 


The proofs of (a’) to (d’) are the same as the proofs of (a) to (d). O 


Thus far we have examined E(Y|*) and E(Y|X = x) under various hy- 
potheses on Y; now we impose conditions on & and X. 
5.5.7 Theorem. (a) E(Y|{6, Q}) = E(Y) ae. 


(a) If X is a constant b a.e., then E(Y|X = x)= E(Y) ae. [Py]. 

b) E(Y|F)=Y ae. 

b) If X: (QR, )— (Q,.¥) is the identity map, then E(Y|X =o) 
= Y (œw) ae. [P]. 


Proor. (a) fe YdP= f E(Y)dP if C =@orQ. 
(a) IfbeA, 


J XdP = | yap =EY)= | EW) aPy. 
{X cA} 02 A 


If b ¢ A, 
/ yap = | ydp=0= | EY)dPx. 
{X EA} ø A 
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(b) Je YdP = j- YdP,C €F, and Y is ¥-measurable. 
b) fixe Y dP = fa Y dP, and Y is ¥’(= F )-measurable. C 


The following result is preparatory to the next theorem. 


5.5.8 Lemma. If f: (Q, Z )— (R, Z), u is a measure on & and B is an 
atom of ¥ relative to u; that is, B € &, u(B) > 0, and if A € & A C B, then 
u(A)= 0 or u(B — A) = 0, then f is a.e. constant on B. 


Proor. IfxeR and p{w €B: f(w) <x} =0, then u{w €B: f(@)< y} 
= 0 for all y < x. Let k = sup{x € R: u{æw eB: f (æ) < x} = 0}. Then 


ulo € B: fo B= ul |] foes: fio) <n) =o 


r rational 
rek 


If x> k, then p{weB: f(@) <x} > 0; hence ufo e B: fl@)>x}=0 
since B is an atom. Thus 


r rational 
rok 


ul € B: fo a=] LJ {weB: fio) =] =o, 


It follows that f = k a.e. on B. O 


We now show that conditional expectation is an “averaging” or “smoothing” 
operation; if B is an atom of & E(Y|¥& ) = k a.e. on B, where k is the average 
value of Y on B. 


5.5.9 Theorem. (a) Let B be an atom of & relative to P. Then 


E(YIZ) = ay [Ya _ EU's) 
P(B) Jp P(B) 


a.e. on B. 


As a special case, let B;, B2, ... be disjoint sets in .¥ whose union is Q, with 
P(B,,) > 0 for all n. Let & be the minimal o-field over the B,, so that Y is 
the collection of all unions formed from the B,,. Then 


1 


E(Y|F) = PB,) 


f Yar a.e. on B,, n=1,2,... 
B, 
(a) If B = {X = xo} and P(B) > 0, then 


1 
E(Y|X = x9) = om | 
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Proof. (a) By 5.5.8, E(Y|#) is a constant k a.e. on B. Since f,Y dP 
= f E(Y|¥)dP for all A € Ẹ, in particular for A = B, we have f,Y dP 
= kP(B), as asserted. 

(a) We first show that B is an atom of Z = F(X). For assume 
A €.#' and X`! (A) C B= X`! {xo}. If xp € A, then X7! {xp} c X7!(A); hence 
X-! (A) = B. If x9 g A, then X! (A) N XT !{xo} = Ø; but XT! (A) CX! {xo}, so 
X- (A) = Ø. 

Now let g(x) = E(Y|X =x), hlæ) = g(X(w)) = EY |F Xw). By (a), 
h(@) = k = E(YI g)/P (B) a.e. on B. If @ is any point of B such that h(w) = k, 
then e(X(w)) = g(x) = k, the desired result. DI 


We now consider successive conditioning relative to two o-fields, one of 
which is coarser than (that is, a subset of) the other. The result is that no 
matter which conditioning operation is applied first, the result is the same as 
the conditioning with respect to the coarser o-field alone. This is intuitively 
reasonable; for example, to find the average value of a real-valued function 
f defined on [0, 3], we may compute a, the average of f on [0, 1], and az, 
the average of f on [1, 3]; the average of f on [0, 3], namely, 3 fl f(x) dx, 
is then ża] + Za. 


5.5.10 Theorem. (a) If i C &%, then E[E(Y|%)|4] = EV’|) ae. 
(a) If f: (Q', F) > (Q", Z”), then E[E(Y|X)| f -X] = E(Y|f eX) ae. 
(b) If 4, C A, then EEYAN] = EYF) ae. 

b) If f: (XV, F) > (Q"".F"), then E[E(Y| f -X)|X] = E(Y| f -X) ae. 


Proor. (a) If Ce, then {.E(Y|4%,)dP = [.YdP = {.E(Y|%)dP 
since C € &. Thus E[E(Y|%2)|%] = E(Y|Y¥;) ae. Alternatively, if C E $, 
then fo E[E(Y|*)|4%,]dP = fo E(Y|%) dP = JY dP since C € 9; thus 
EYJA) = EEY A) 2] ae. 

a) Let P =X UF), A =[f X! (F)=X FF"), and 
apply (a). 

(b) E(Y|&,) is &,-measurable, hence &2-measurable, and 


[eomar = | EYZ )aP, CER. 
C C 


(b) Take %, and S“ as in (a’), and apply (b). O 


If we take the conditional expectation of a product of two random variables, 
under certain conditions one of the terms can be factored out, as follows. 


5.5.11 Theorem. (a) If Z is &-measurable and both Y and YZ are inte- 
grable, then E(YZ|&) = ZE(Y |) ae. In particular, E (Z| ) = Z ae. 
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(a) If f: (Q',.4') > (R, 2) and both Y and Y(f -X) are integrable, 
then E(Y (f -X)| X = x)= F@)E(Y|X = x) a.e. [Py]. In particular, 
E(fX|\X=x)= f(x) ae. [Px]. 


Proor. (a) If Z is an indicator Ig, Be &, and C € & we have 
| vzar= | yap = | EW|)aP = | IEF )dP 
C CNB CNB C 
= | ZEYIŞ )dP, 
C 


and ZE(Y|&) is &-measurable. Thus the result holds for indicators. 
Now let Z be simple, say 


Z=S zz, with ByeV¥. 
j=l 
By 5.5.2(a), 


E(YZ|F)= > yE) = X zls EYZ )= ZEY|F). 


j=1 j=1 


If Z is an arbitrary ¥ -measurable function, let Z,, Z2, ...be simple (and 
%7 -measurable) with |Z,| < |Z| and Z, > Z. Now E(YZ,|&) = Z, EY |S ) 
by what we have just proved, and E (Y Z, |F ) > E(YZ|¥ ) by 5.5.5(a). (The 
integrability of YZ is used here.) Since Y is integrable, so is E(Y|& ); hence 
E(Y|&) is finite a.e., and consequently Z E (Y |F ) > ZE(Y|#) a.e. [Note 
that, for example, 1/n — 0 but (1/n)(co)-+— 0(co) = 0; thus finiteness of 
E(Y|#) is important.] Therefore, E(YZ|¥ ) = ZE(Y |). 

(a) Let f= Ig, B € F’. Then 


/ V(foX)aP = | VI xem) aP = | Y dP 
{XEA} {X EA} (X¥EANB} 
= f E(Y|X = x) dP x(x) = | F(X)E(Y|X = x) dPy (x). 
ANB A 


Thus the result holds when f is an indicator. Passage to simple functions and 
then to arbitrary measurable functions is carried out just as in (a). O 


5.5.12 Comments. (a) Theorem 5.5.11(a) holds under the weaker assump- 
tion that E(Y) and E(YZ) exist [and 5.5.11(b) under the assumption that E(Y) 
and E[Y (f -X)| exist]; see Problem 4. 
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(b) Intuitively, if it is known that X = x, then f-X may be treated as 
a constant f(x), so that E(Y(f -X)|X =x) should be f@E(Y|X =x) as 
in 5.5.11(a’). A similar interpretation may be given to 5.5.11(a) as follows. 
We can express E(YZ|&) as E(YZ|.F(X)) = E(YZ|X) for an appropriate 
random object X (see 5.4.4). Since Z is .%(X)-measurable, Z = f -X for some 
f: (Q, F) > R, B) by 5.4.2(c). Thus if X is given, Z may be treated as 
a constant and factored out. 


Problems 


l. (a) If X is a random object, Y a random variable such that E(Y) exists, 
and X and Y are independent, show that E(Y|X) = E(Y) ae. [P], 
and E(Y|X = x)= E(Y) ae. [Py]. 

(b) Give an example of integrable random variables X and Y such that 
E(Y|X) = E(Y), but X and Y are not independent. 

2. If Y is an integrable random variable and X and Z are random objects, 

show that if (X, Y) and Z are independent, then E(Y|X, Z) = E(Y|X). 

3. This problem illustrates how to obtain a theorem about E(Y|X = x) from 

a corresponding theorem about E(Y|X) without writing a separate proof 


or saying “similarly.” As a typical example, suppose we have proved that 
if Y, < Y, a.e., then E(Y,|X) < E(Y2|X) ae. Show that 


PLE(Y |X) > E(¥2|X)} = Px{E(Y |X = x) > E(Y2|X = x)}. 


Conclude that if Y; < Y2 a.e., then E(Y|X = x) < E(Y2|X = x) a.e. [Py]. 

4. Extend Theorem 5.5.11 to the case where E(Y) and E(YZ) exist but are 
not necessarily finite. 

5. Let (Q,.¥,P) be a probability space, and & a sub o-field of F. If 
Y e L! (Q, F, P), then E(Y|#) is also in L! (Q, Z, P), by 5.5.4(a). Thus 
A(Y) = E(Y| Ç ) defines a linear operator on L', and A may be transferred 
to the Banach space L!(Q, Z, P). 

(a) Show that |/A|| = 1. 

(b) Define (X, Y) = [,XYdP, X € L', Y € L”. Show that A has the 
“self-adjointness” property (AX, Y) = (X, AY). [Note that L® c L}, 
and if Y €e L”, then AY € L”, so (X, AY) is well defined. ] 


5.6 REGULAR CONDITIONAL PROBABILITIES 
We have seen that if B;, B2, ...are disjoint sets in.¥, n = 1,2,..., then 


P (U B17) =) P(B, |F) ae. 
n=l n=l 
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This does not imply that we will be able to choose P(B| ¥ )(@), BEF, w € Q, 
so that it is a measure in B for all (or almost all) @ € Q. To clarify the problem, 
suppose that for each B € .¥, we choose a particular version of the conditional 
probability P(B|& Xæ), w € Q; we now have a number P(B| ¥ Xæ) for each 
Be.¥ and w € Q. The difficulty is that for a fixed w, P( |F )(@) need not be 
countably additive on F. Suppose that B,, B2, ...are disjoint sets in .7. Then 


P ||] Bul¥ | (œ) = >) PBnl Yo) 


n= | n=] 


except for w belonging to a set N(B,, Bz,...) of probability 0. Thus the set 
of w’s for which countable additivity fails is 


M = JN (Bi. By, ...): By, Bo, ... disjoint sets in 7}. 


In general, M is an uncountable union of sets of probability 0, and therefore M 
need not have probability 0 (or even be in.¥ ). Thus there is no guarantee that 
we can specify the P(B|.¥ ) to be countably additive in B. [Similarly, there is 
no guarantee that P(B|X = x) will be countably additive in B for Py-almost 
all x € Q’.] 

We are going to establish conditions under which the countable additivity 
requirement can be met. 


5.6.1 Definition. Let Y be a random variable on (Q, Z, P), Sa sub o-field 
of Z. The function F = F(w, y), œ € Q, y€ R, is called a regular condi- 
tional distribution function for Y given ¥ iff the following two conditions are 
satisfied. 

1. For each œ, F(q@, -) is a proper distribution function on R, that is, in- 
creasing and right-continuous, with F(@, co) = 1, F(w, —co) = 0. 

2. For each y, F(@, y) = P{Y < y|? Xæ) for almost every w. 

The key step in the development is the result that a regular conditional 
distribution function for a given random variable always exists. 


5.6.2 Theorem. If Y is a random variable on (Q, F, P), & a sub o-field of 
F, there is always a regular conditional distribution function for Y given &. 


Proof. Select a version of F,(@) = P{Y <r|&}(w) for each fixed rational 
r. If ri, r2,...18 an enumeration of the rationals, let 


Aij = {o: F,,(@) < F,(@)}, = A= JA: ri < r} 
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Then P(A) = 0 since r; < r; implies P{Y < n|} < P{Y <r; } ae. by 


5.5.1 (b). 
Now let 
OO 
B; = fo lim Fran lo) F F,,(@)}, B= _) Bi. 
i=] 
Since 


lyera + lyr) as O n> OO, 


5.5.5(a) yields 
Fratan (œ) —> F, (w) a.€., 


hence P(B) = 0. 
If C = {@: lim, => F,(@) Æ 1}, then P(C) = 0 since {Y < n} f Q, so that 
P{Y < n| }— 1 ae. Similarly, D = {@: lim,- F,(@) Æ 0} has proba- 


bility 0. 
Define 
lim F, (@) if wPAUBUCUD 
ro>yt 
F(@, y) = 4 any proper distribution function G(y) 


if wEAUBUCUD. 


Then F is well defined, for if œ ¢ A, then F,(@) is monotone in r, so that 
lim,_,,+ F,(@) exists. Note also that if œ ¢@AUB, then lim,_,,+ F,(@) 
= F,(@) if y is rational, so that in this case, F (œ, y) = F,(@). Similarly, if 
w Z A UC UD, then lim, >œ F,(m) = 1, lim,_,_.. F, (w) = 0. 

We show that F is a regular conditional distribution function for Y given “. 
Fix œ ¢ A U B U C U D; then F(a, -) is clearly increasing. If y < y < r, then 
F(@, y) < F(a, y) < F(@,r) = F,(@) > Fæ, y) as r — y. Thus F(a, -) is 
right-continuous. If r < y, then F (œ, y) > F(@,r) = F,(@) > 1 asr > œ; 
hence F (w, y) > 1 as y > œ; similarly, F (@, y) > 0 as y > —oo. Thus 
the first requirement is satisfied. 

Now P{Y <r|&}@) = F, (æ) = F (æ, r) by construction of F. As r J y, 
F(@,r) > F(o, y) for all œ by right-continuity, and 


P{Y < r|} > P{Y < yF} ae. 


by 5.5.5(a). Thus P{Y < y|? Kæ) = F(a, y) for almost every @ (y fixed), 
establishing the second requirement. LI 


We are going to show that if Y is a random variable, P{Y € B|&} can be 
chosen so as to be countably additive in B. This will follow from 5.6.2 and 
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the fact that a distribution function determines a unique Lebesgue -Stieltjes 
measure. 


5.6.3 Definition. Let Y: (Q, FZ )—> (X, ¥’) be a random object, and X 
a sub o-field of F. The function Q: Q x.” — [0, 1] is called a regular 
conditional probability for Y given ¥ iff 


(1) Q(@, B) is a probability measure in B for each fixed w € Q, and 
(2) for each fixed B € F’, O(w, B) = P{Y € BJF yœ) ae. 


If Y is a random variable, so that Q’ = R,.7” = (R), a regular conditional 
probability for Y given ¥ always exists. 


5.6.4 Theorem. Let Y be a random variable on (Q,.¥, P), & a sub o-field 
of ¥. There exists a regular conditional probability for Y given . 


Proor. Let F be a regular conditional distribution function for Y given &. 
Define 


Qw, B) = dF (w, y). 
yeB 


Thus for each w, O(w, -) is the Lebesgue- Stieltjes measure corresponding to 
F (@, -); hence Q is a probability measure in B if w is fixed. 

Now let Z = {B €.4(R): O(o, B) = P{Y € B|¥}(@) a.e.}. Then & con- 
tains all intervals (—co, y] since F(@, y)=P{Y < y| Xæ) ae. If A, 
Be@, ACB, then B—A€e @, and it follows that @ contains all intervals 
(a, b], hence all finite disjoint unions of right-semiclosed intervals. By the 
monotone class theorem, @ = .# (R). Thus Q is a regular conditional proba- 
bility for Y given 7. [I 


We now extend this result to objects Y more general than random variables. 


5.6.5 Theorem. Let Y: (Q, Z —> (Q, 7’) be arandom object, and X a sub 
o-field of F. Suppose there is a map Y: (X, Z) —> (R, #(R)) such that Y is 
one-to-one, E = W(Q’) is a Borel subset of R, and W~! is measurable, that is, 
W-!: (E,.8(E)) > (X, ¥’ ). Then there is a regular conditional probability 
for Y given 7. 


ProoF. Let Qo = Qo(o, B), B € A(R), œw € Q, be a regular conditional prob- 
ability for the random variable W(Y) given &. Define O(w, A) = Qo(w, V(A)), 
A € F’; since Y! is measurable, W(A) € Z(E) C #(R), and QO is well de- 
fined. Now Ọ is a probability measure in A for œ fixed, and if A is fixed, 
then 


O(w, A) = P(W(Y) € WAIS w) = P[YeAl¥}(w) ae. O 
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A map W of the type described in 5.6.5 is called a Borel equivalence of Q' 
and E. 


Problems 


1. LetX: (Q, F )— (X, F’) and Y: (Q,F%) —> (Q", 7”) be random ob- 
jects on (Q, .¥, P), and assume that P(B) = P{Y € B|X = x} can be cho- 
sen so as to be a probability measure in B for each fixed x. If g: (QR x Q”, 
F' x F") — (R, #) and E[g(X, Y)] exists, show that 
(a) Elg(X, YIX =x] = Jo. g(x, y)dPy(y) a.e. [Px]. 

In particular, if C € F” x Z”, then 

(b) P{(X, Y) € C|X =x} = P(C (x)) ae. [Py]. 
Conclude that 

(c) P{(X, Y) eC} = fy Px(C(x)) dPx (x). 

2. Let Y,,..., Y, be independent random variables, each uniformly distrib- 
uted between 0 and 1. Let Z; be the product Y; --- Y}, 1< k <n. Show 
that, given Z; = Z1, .--, Zk = 2, Zx41 is uniformly distributed between 0 
and zę, that is, 


P{Zey, € BIZ) =2,....2e=%) = f g: (z)dz. 
B 


where g; (z) = 1/%,0 < z < zk; g%(z) = 0 elsewhere. (This is another way 
of looking at Example 5.2.2.) 


3. The following result is preparatory to the next problem; it is adapted from 
Halmos (1950). 


(a) Let E be a Lebesgue measurable subset of R with 0 < u(E) < œ, 
where u is Lebesgue measure. If 0 < 5 < 1, show that there is an 
open interval 7 such that u(E N Í) > êu(I). 

(b) With E as in part (a), let D(E) = {x — y: x, y € E}. Show that D(E) 
includes a neighborhood of 0. [This holds also if u (E) = co since 
0 < w(EM[—n,n]) < œ for some n.] 

(c) Let ¢ be an irrational number, and let A = {m + n¢: m,n integers}. 
Show that A is dense in R. Equivalently, the set of numbers n¢é, n 
an integer, reduced modulo 1, is dense in [0, 1). If [0, 1) is identified 
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with the unit circle under the correspondence 6 > e°, the problem 
is as follows. If w/2z is irrational, the set {e’"*: n an integer} is 
dense in the circle. In fact more can be proved. Let zo be any point 
on the circle and let z, = e'"“zy = zo rotated by na, n = 1,2,..- 
If z is an arbitrary point on the circle and e > 0, |z, — z| < e for 
infinitely many values of n. 


(d) If ¢ is irrational, let B = {m + n¢: n an integer, m an even integer}, 
C ={m-+né: n an integer, m an odd integer}. Show that B and C 
are dense in R. 

(e) Define an equivalence relation on R by x ~ y iff x — y € A, where 
A is as defined in (c). Form a set Ep by selecting one point from 
each distinct equivalence class. Show that Fo is not a Lebesgue 
measurable set. [If F is a Borel set with F C Eo, show that u(F) = 
0. Then show that R is a disjoint union of the sets Ep + a,a €A. 
Use the translation-invariance of Lebesgue measure to show that Eo 
is not Lebesgue measurable. ] 


© Let M={x+y: xe Eo, y EB} M={x+y: xe Ep, ye Ch. 
Show that R = M U M’, and any Borel subset of M or of M’ has 
Lebesgue measure 0. 


(g) Let E be an arbitrary Lebesgue measurable subset of R. Show that 
if F is a Borel subset of E N M, then u(F) = 0, and if G is a Borel 
set such that ENM C G C E, then u(E —G)=0. 


4. Let H be a subset of [0, 1] with inner Lebesgue measure 0 [sup{u(B): 
B a Borel subset of H}=0, u= Lebesgue measure] and outer Lebesgue 
measure 1 (inf{u (B): B a Borel overset of H} = 1). To construct such a 
set, take E = [0,1], H =EN M in Problem 3(g). 

Let Q = [0, 1], and let ¥ consist of all sets (B; O H)U (B: A H°), 
where B,, Bz are Borel subsets of [0, 1]. Define 


P[(B, NH) U (B2 N H°)] = 5(u(B)) + u(B2)). 
Thus P = u on H[0, 1] lif B € 2 [0,1], B= (BN H)U (BN H"). Take 


Y(@)=0, wE Q. 

(a) Show that P is well defined, that is, if (B;,NH)U(BNH) 
= (B; N H)U (B7 N H°), then u(B1) = w(By’), u(B2) = w(Bo’). 

(b) Suppose that Q is a regular conditional probability for Y given 
F =. H0, 1]. Show that O(w, H) = O(a, H€) = A a.e. 


(c) If Be &, show that O(w, B) = I5(@) ae. 
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(d) Show that O(@, {@}) = 1 for almost every œw, and thus arrive at a 
contradiction. 
Conclude that there is no regular conditional probability for Y given ¥. 


Note: Books that discuss Martingale theory must also treat conditional prob- 
ability and expectation. Thus for a selected bibliography on the material in 
this chapter, see Section 6.10. 


CHAPTER 


6 


STRONG LAWS OF LARGE NUMBERS 
AND MARTINGALE THEORY 


6.1 INTRODUCTION 

At the end of Chapter 4, we indicated that the physical fact of convergence 
of the relative frequency of heads in coin tossing is best expressed as a state- 
ment about almost everywhere convergence of S,/n, where S,, is a sum of 
independent random variables X1, X2,...,X,3; we attached the name “strong 
law of large numbers” to such a result. Now a “strong law of large numbers” in 
the most general sense is any statement about the almost everywhere conver- 
gence of a sequence of random variables, and this is the main subject matter 
of this chapter. A large class of convergence theorems will be developed with 
the aid of martingale theory, but before going into this it will be useful to 
consider the classical approach. 

First we prove some results from real analysis that will be needed. 


6.1.1 Lemma. Let A = [a;;] be an infinite matrix of real numbers; assume 
that an; > 0 as n —> oo for each fixed j, and that for some nonnegative 
real number c, 7°"; |a@nj| < c for all n. If {x,} is a bounded sequence of real 
numbers, define 


Oo 
yn = > An jXj, n=1,2,.... 
j=l 


Then: 
(a) If x, — 0, then y, — 0. 
(b) If S2, an; > l and x, — x (x real), then y, > x. 


PRoor. (a) We may write 


N OO 
Yal < So lang] elt X, lanyl Ix. (1) 
j=! 


j=N4+1 
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Given £ > 0, choose N so that |x;| < €/c for j > N; the second term on the 
right-hand side of (1) is at most c(e/c) = e. Since the first term approaches 0 
as n — co for any fixed N, it follows that y, — 0. 

(b) By (a), S752, anj(xj — x) > 0, and the result follows. O 


6.1.2 Toeplitz Lemma. Let {a,} be a sequence of nonnegative real numbers, 


and let b, = i= a;; assume b, > 0 for all n, and b, > œ as n > on. If 


{xn} is a sequence of real numbers converging to the real number x, then 
] it 
p. ; a jx pws. 


Proor. Form an infinite matrix A whose nth row is 


(o 5, oan >, 0 0 o) 


and apply 6.1.1(b). OU 


6.1.3 Kronecker Lemma. Let {b,,} be an increasing sequence of positive 
real numbers with b, —> œœ, and let {x„} be a sequence of real numbers with 
YOL, Xy = x (finite). Then 

l 


it 
— S$ bjj > 0 as n — Oo. 
bn Pa 


Proor. If s, = $; x; (with so = 0), 
X bjx; = X biis; — Sj—1) 
j=l j=l 


= bySn ~ boso ~ X 8j-1(bj — bj-1), 


j=l 


taking bo = 0. (This is the “summation by parts” formula; it is proved by brute 


force.) Thus 
le le 
b. N bjx = fy — 5 2 4Sj-i 
n j=] H j=] 


where a; =b;— b;—; > 0. Since s, > x as n— œ, and Sj- —>x as 
j> œ, 


| il 
5 bi —> 0 by61.2. 0 


n j=l 
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Now if S, is a random variable with finite expectation, Chebyshev’s in- 
equality implies that 


Var S, 
ez 


l 
P{|Sn — E(Sn)| = €} < ZELS, — E(S„,) 1] = 


If in fact $, is a sum of independent random variables, this result can be 
strengthened considerably, as follows. 


6.1.4 Kolmogorov’s Inequality. Let X,,...,X,, be independent random vari- 
ables with finite expectation, and let $; = X; +---+ Xj, j= 1,...,”. Then 
for any € > 0, 
Var Sn 
P Í max IS; — E(S;)| = e) < —; 
<JEn 


PRoor. We may assume without loss of generality that E(X;) = 0, hence 


Ay = {|S <£, /=1,...,k—1, |S] = eh, 


A= { max Sil = eh: 


l<j<n 


A is the disjoint union of the Az, k = 1,..., n. Now 


Var S, = | s2ap> | sèd =Ŭ | S,” dP. (1) 
Q A kay vk 


But $, = Sk + Yk, where Yk = Xy4,; +---+X,; hence 


| s2ar= f s?aP +2 | SYrdP + | Y, dP. (2) 
Ax Ax Ax Ax 


The second term on the right-hand side of (2) is 2E[/4,5;Y,;] which is 0 since 
I4,5; and Y; are functions of independent random variables, and are therefore 
independent [see 4.8.2(d) and 4.10.8]. Since the third term of the right-hand 
side of (2) nonnegative, we have 


f S,° dP > J S dP > & P(A) by definition of Ax. 
Ax Ay 


By (1), 


Var S, > °X P(Ax) = P(A). O 
k=l 
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We shall use the Borel—Cantelli lemma (2.2.4) quite often: If A;, A2, ...are 
events such that $`, P(A, ) < oo, then lim sup, A, has probability 0. There is 
a partial converse which will also be needed [see (6.8.9)]. 


6.1.5 Second Borel—Cantelli Lemma. Let (Q,.¥%, P) be a probability space, 
and let A;, A2,...be independent events in J. If YFL, P(A,) = œ, then P 
(lim sup, A,) = 1. 


PROOF. 
OO 
P (1imsupa, ) = P MU Al = lim P U Ak 
= Him, tim P | OAs 
Now 


C 


P 5 Al =P () Ay | = Jl P(A‘) by independence 
k=n k=n k=n 


< Jl exp[—P(A;)] since P(A,) = 1 — P{Ax) 
k=n 


< exp[—P(Ax)] 


— 0 as m —> œ since N P(A) = œ. U 


Problems 


l. (Extension of the second Borel—Cantelli lemma) Let A; , A2, ... be events 
in a given probability space such that XP, P(A,) = oo and 


Daj kat PAS NAR) _ 
V 
(r=: PA;)) 


(a) Show that the lim inf condition above is satisfied if the A, are 
pairwise independent (A; and A, are independent whenever j Æ k) 
and X`, P(A„) = œ. 


lim inf 
n — OO 
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(b) Use Chebyshev’s inequality to show that if 7, = J,_, then 


So Tk — Y PA) > > 5ra] = 0. 
k=} k=l 


k=! 
(c) Conclude from (b) that there is a sequence of integers n; < nz <--:- 
such that with probability 1, 


n —> 00 


lim inf P l 


Hj 1 Wy 
SoIk>= > Px) for sufficiently large j. 
k=] 2 k=l 


(d) Show that P(lim sup, A,) = 1. 


2. Let @ be uniformly distributed on [0,27] and define X, = sinké 
(k = 1,2,...). Show that 


X,+-::4+X, 
— 
n 


6.2 CONVERGENCE THEOREMS 

We are now in a position to establish several basic results on convergence 
of sequences of random variables. We start with an example that motivated 
some of the early work in this subject, the problem of random signs. Let 
a;, @&2,... be a fixed sequence of real numbers, and let an unbiased coin be 
tossed independently over and over again. If the nth toss results in heads, 
we write down the number +a,, if tails, the number —a,. In this way we 
generate a Series such as a; — a — a3 + a4+---, where the signs are chosen 
at random. Will the series converge? 

The general question suggested by the random signs problem involves the 
convergence of a series of independent random variables. The following result 
gives considerable information. 


6.2.1 Theorem. Let X,,X2,...be independent random variables with finite 
expectation. If De, Var X, < oo, then DY [, — E(X,,)] converges a.e. 
[All random variables are assumed to be defined on a fixed probability space 
(2, 7, P), and “almost everywhere” refers to the probability measure P. Also, 
throughout this chapter, convergence will always mean to a finite limit. ] 


PRoor. We may assume that E(X,) = 0. Let S, =X; +---+X,. Then S, 
converges iff S$; —S, —> 0 as j,k — oo, and this happens a.e. iff for each 
E> 0, 


P LJ US; — Sel = £) > 0 as n—> oo 


Jk=n 
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(see Section 2.5, Problem 4). Equivalently, we must prove that for each ¢ > 0, 


OO 
P LJ £1Simtk ~ Sml >eh| > 0 as m — œ. 
k=|1 


We have 


P || JilSmtk ~ Sml > £) 


J lim P tSt — Sm| > £} 


= lim P l max |Smik —Sm| Z e 
H — OO 


leken 


LA 


l 
2 lim sup Var(Sn4n — Sm) by 6.1.4 


Fl > OO 


1. á 
= limsup X ` Var(Xm4;) by 4.10.11 


E noo ^ 
j=1 


OO 
—> 0 asm— œ since ) Var X, < co. UO 


n=] 


In the random signs problem we have X,, = a, Ya, where the Y,, are inde- 
pendent, taking values +1 and —1 with equal probability. It follows that if 
ye a < œ, the series 5° , X, converges a.e. After we prove Theorem 
6.8.7, we shall see that the condition 5~*-_, a* < oo is necessary as well as 
sufficient for a.e. convergence of the series. 

If X,,X2,... are independent random variables, we proved in Chapter 4 
that under appropriate conditions, (S, — E(S,))/n converges to 0O in proba- 
bility (the weak law of large numbers). We now consider almost everywhere 
convergence. 


6.2.2 Kolmogorov Strong Law of Large Numbers. Let X,, X2,... be inde- 
pendent random variables, each with finite mean and variance, and let {b,,} be 
an increasing sequence of positive real numbers with b, — oo. If 


then (with S, =X, +--:-+X,) 


Sr _ E(S,) 
— > 
by 


0 a.e. 
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PROOF. 


~ E(X, Var X, . 
2 Var (n) = s~ T < CO by hypothesis. 
n=] Dn n=] 


n 


By 6.2.1, 30, (Xn — E(Xn))/bn converges a.e. But 


EG) -En n (* ZEWN 


On ta 


and this approaches zero a.e. by 6.1.3. Li 


In particular, if the X,, are independent random variables, each with finite 
mean m and finite variance oł, then S,,/n —> m a.e. (take b, = n in 6.2.2). 

Another special case: If the X, are independent and the fourth central 
moments are uniformly bounded, that is, for some finite M we have 
E[(X, — E(X,,))*] < M for all n, then (S, — E(S,))/n —> 0 a.e. For by the 
Cauchy -Schwarz inequality, 


Var X, = E[(X, — E(Xn))° - 1] < (E[Xn — EX, DN < M", 


and therefore 6.2.2 applies with b, = n. This result, due to Cantelli, may in 
fact be proved without much machinery from measure theory; see Ash (1970, 
p. 206). 

If the X,, are independent and all have the same distribution, in other words, 
for each Borel set B C R, P{X, € B} is the same for all n, a version of the 
strong law of large numbers may be proved under a hypothesis on the mean 
of the X,, but no assumptions about higher moments. We first indicate some 
terminology that will be used in the remainder of the book. 


6.2.3 Definition. If the random variables X,, all have the same distribu- 
tion, they will be called identically distributed. The phrase “independent and 
identically distributed” will be abbreviated iid. 


We need one preliminary result. 


6.2.4 Lemma. If Y is a nonnegative random variable, 


> PUY =n} <E) <1 +) > Ply >n). 


n=l n=] 
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PROOF. 


SPY >n} 
n=] 


0O CO k 
So P{k<¥<k+l}= ` P{ik<Y<k-+4+l} 
k= k=l n=l 


Me 


= 
Il 
= 


eles Y <+ = > 


keran 


J YdP =E(Y) < Sik + IPIK < Y <k+1) 
{kK<Y<k+1} k=-0 


PU =n) + OPS Y <k+ N=% PY = m+. O 
k=0 n=] 


iM te Me 


il 


6.2.5 Strong Law of Large Numbers, iid Case. If X,,X.2,...are iid ran- 
dom variables with finite expectation m, and S, = X; +--+ Xn, then 
S,/n > m a.e. 


Proor. Since all X,„ have the same distribution, 


X PIX, | =n} = >> PIXI =n}: 


n=] n=] 


thus so 
X P(X, > n} < E(X) <œ by 6.2.4. 


n=] 
By the Borel—Cantelli lemma, P{|X,,| > n for infinitely many n} = 0. Thus 
if we define Y, = X, if |X,| < n; Y, = 0 if |X, | > n, then except on a set of 
probability 0, Y, =X, for sufficiently large n. Thus assuming (without loss 
of generality) that m = 0, it suffices to show that 


] n 
— ; Y; —> 0 a.c. 
nn 

j=l 


Now 
E(Y,) — E(X aly, ]<n}) 
= E(X yl x, )<n}) 
by the 1d hypothesis 
> E(X,)=0 as n> œ 
by the dominated convergence theorem. 
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Consequently, 
| H 
— X_E(Y;) > 0, 
n4 
j=1 
and therefore it is sufficient to show that 


] rn 
~ NY- EY; >0 ae 


j=l 
= Yn — E( Yn) 
a 


n= | 


If we can show that 


converges a.e., then the Kronecker lemma 6.1.3 with b, = n and 
E Y, = E(Y,) 


fl 3 
n 


yields 


- \ Y- EY; >0 ae, 


j=l 
as desired. Now the Y, are functions of the independent random variables X,,, 
and hence are independent, so by 6.2.1, it suffices to show that 


v= 5 Var (=) < 00. 
n=l 


(Note that |Y,| < n, so Var Y, is finite, although nothing is known about 
Var X,.) But 


since Var Y, = E(Y,,7) — [E(Y,)]’ 


i 


2 l _. , 
> -FEX x<) by the iid hypothesis 
n=] 


OO 


] n 
= > n2 NO E(X i m—1<\X:1<m}) 
m= | 


n=] 


OO OO l 
S EX limicie) >, 7 
m=] n=m 
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By comparing 57(1/n7) with [(1/x*)dx, we find that ~°°_,(1/n*) < K/m 
for some fixed positive constant K. Thus 


< 1 
V< KX —E (Xl im-11X11<m}). 


m=|l 


If m—1 < |X| < m, then X? = |X,| |Xı| < m|X,|; hence 


OS 
V <KY_ E(X (m-1<x,)<m)) = KE(\X\|) < œ. O 


m= | 


If E(X;) exists but is not necessarily finite in Theorem 6.2.5, the result 
still holds. To see this, first assume that the X; are nonnegative, with infinite 
expectation. If M > 0 and S,’ = X;—, X:lix;<m}, then, almost everywhere, 

7 
lim inf 2" > lim inf 2 = E(X lix <m) > E(X;) = œ as M > œ. 
noo y n> N 
Therefore n~'S,, —> œœ a.e. The general case is handled by splitting the random 
variables X; into positive and negative parts. 

We conclude this section with a remarkable result about independent random 
variables. If X,, X2, ...are independent, this question might arise: What is the 
probability that X7, X,, converges? It might be expected that examples exist 
with any number between 0 and | as the probability of convergence, but in 
fact, the probability must be O or 1. Many other events defined in terms of 
independent random variables have this “zero—one” property, as we shall see. 


6.2.6 Definitions and Comments. Let X,, Xz, ...be a sequence of random 
variables, and let .%, = o(X,,Xn41,.--),4 = 1,2,...3.4%, may be thought of 
as the o-field of events involving X,,, X,.;,.... The o-field Fz = fn Fy 
is called the tail o-field of the X,,, sets in Z are called tail events and Io- 
measurable functions, that is, functions f: (Q, Z) > (R,.@(R)) are called 
tail functions (relative to the X,, ). Intuitively, a tail event is one whose occurrence 
or nonoccurrence is not affected by changing the values of finitely many of the 
X;, and a tail function is one whose value is not affected by such a change. Thus 
{lim,-+o0 X, exists}, {sx X, converges}, and {X, < 1 for infinitely many n} 
are tail events, and lim sup, _,., Xn and lim inf,,_,.. Xn are tail functions. [Exam- 
ple of a formal proof: {$7 X, converges} = {$7y~_, X; converges} € .F,, for 
each n; hence the event belongs to .¥,,.. Similarly, 


im sup X, < c) = < limsup X; <c% €.¥,, for all n; 
n — 00 Ko 


hence lim sup, Xn is ¥%oo-measurable. ] 


6.2 CONVERGENCE THEOREMS 245 


6.2.7 Kolmogorov Zero—One Law. All tail events relative to a sequence of 
independent random variables have probability O or 1, and all tail functions 
are constant almost everywhere. 


Proor. Fix A € Z; the idea is to show that A is independent of itself, 
so that P(ANA)=P(A)P(A), and consequently P(A) =O or 1. Since 
Fo C.F, A is of the form {(X,, X2,...) € A’} for some A’ € B (R). Let 
6 be the class of sets C'e #UR)™ such that A and C are independent, 
where C = {(X,, X2,...) € C’}. If C’ is a measurable cylinder, then C is of 
the form {(X;,...,X,) E B,}; since A €.¥,,,, A can be written in the form 
{(Xn41, Xn42,---) E Ans}, and it follows that A and C are independent. Thus 
& contains all measurable cylinders. But if Cp’ E€ Z, Cy’ |t C (or Cy’ | C, 
and P(A N C) = P(A)P(C,), n = 1,2,..., then C, t C (or C, | C); hence 
P(A N C) = P(A)P(C). Therefore @ is a monotone class containing the mea- 
surable cylinders; hence % contains all sets in (R), in particular, A’. But 
then A is independent of itself. 

Finally, if f is a tail function, then for each c € R, {@: f(@) < c} is a tail 
event, and hence has probability 0 or 1. If k = sup{c € R: P{ f <c} =O}, 
then f = k a.e. L 


If the X,, are identically distributed as well as independent, a wider class 
of events will be shown to have probability 0 or 1. 


6.2.8 Definitions and Comments. Let X,, X2,... be a sequence of ran- 
dom variables, and define the o-fields Z, as in 6.2.6. Let A € Fi, so that 
A= {(X,, X2,...) EA}, A’ € BZ(R)™. The event A is said to be symmetric 
iff the occurrence or nonoccurrence of A is not affected by a permutation of fi- 
nitely many of the X;. Formally, if T: {1,2,...} —> {1, 2,...} is a permutation 
of finitely many coordinates, we require that A = {X7(1), X7(2),...) EA}. 
Any tail event A is symmetric, for if T permutes the first n coordinates, we 
may write A in the form {(X,,41, Xn42,---) € Angi} since A € ¥,4,. Thus 


A = {(X,, X2,...) € R” x Angi} = (Xr X72), ---) E€ RY x Anyi} 


since T(k) =k, k > n. 
There are, however, symmetric events that are not tail events, for example, 
{X, = 0 for all n} and {lim,_...(X; +---+X,,) exists and is less than c}. 
If B= {(X,,X2,...) EB} EFi, not necessarily symmetric, and T per- 
mutes finitely many coordinates, we denote by X(T) the sequence (Xra), 
X7(2),-...) and we denote by B(T) the event {X(T) € B’}. Thus B is symmetric 
iff B = B(T) for all T. 
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6.2.9 Hewitt—Savage Zero—One Law. Let X,, X2,... be iid random vari- 
ables. If A is a symmetric set in o({X,, X2, ...), then P(A) = O or 1. 


Proor. Let A= {(X;,X,...)€A’} so that P(A) = Py(A’), X= (XxX), 
X>,...). Find measurable cylinders C’ such that Py(A’ A Cr) — 0 as 
k — co (see 1.3.11), and let C; = {X € Cr}; say Cy = {(X1,..., Xn) € Buh. 
Let T; interchange (1,...,7;) and (np + 1,..., 2n,). 
Since the X,, are iid, X and X(T};) have the same distribution; therefore 
P(A A Cz) = Px (AA Cx’) = Py, (A'A Cr) 
= P[{X(T,) € A} A {X(T}) € Ck} 
= P[{X € A'A {X(T}) € CKY since A is symmetric 
= PAAC,(T;,)). 
Thus P(A AC) and P(AAC;(T;)) approach 0, and therefore so does 
P(A A [C; NC;(7;)]). It follows that P(C;), P(C;(7;)) and P[C, N CT) 
all approach P(A). But 
PLC; N Ci (Tx) ] — P[{(X;, s.» Xn) E Bx, (Xn, +1 sens X4n,) € B} 
= P(C)P(C:(Tk)). 


Let k — œ to obtain P(A) = P(A)P(A). LU 


Problems 


l. Let X,, X2,...be iid random variables. If 
l n 
-9 Xi 
n k=] 


converges a.e. to a finite limit, show that E(X,) is finite and the limit 
equals F(X) a.e. 


2. The following result generalizes Lemma 6.2.4, which was used in the 
proof of the strong law of large numbers. 
Let F be an increasing, right-continuous function from the set of non- 
negative real numbers to itself, with F(O) = 0. If Y is an arbitrary non- 
negative random variable, show that 


E[F(Y)] = f re > A} dF (à). 


6.2 
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In particular, if F(A) = à we obtain 
OO 
E(Y) =j PIY > à}dì. 
0 


We obtain 6.2.4 by expressing the integral from 0 to co as a sum of 
integrals from n ton + 1. [Let h(w, A) = 1 if Y (œ) > A, and 0 if Y (œw) < 
A, and use Fubini’s theorem. Note also that {Y > A} can be replaced by 
{Y > i} in the above formulas; the same proof applies. ] 


Give an example to show that the Hewitt- Savage zero—one law may fail 
when the X,„ are independent but not identically distributed. 


Let X,,X2,... be iid random variables, and let g: (R®,.#°) 
—> (R®”, °), where 4 = (R). If g is symmetric, in other words, 
EXTA» XT(2), ---) = 8x1, x2, . . .) whenever T is a permutation of finitely 
many coordinates, show that g(X,, X2,...) 1s constant a.e. 


Let X,, X2,... be independent random variables, with P{X, = 1} = pp, 
P{X, = 0} = 1 — p,. Show that 


X,——0 iff lim p, =0 
H — OO 
and 


OO 
X, —> 0 iff SYS p<. 
n=l 


Let X1, X2, ... be nonnegative random variables, with X„ having density 
An CXP(—Ayx),x > 0 (àn > 0). 
(a) EYL àt < co, show that $L] X, < œ ae. 


(b) Ifthe X,, are independent and X72, à+ = œœ, show that 


OO 
X X, =œ a.e. 
n=l 


[In (b), consider exp(— $75_., X;)-] 


Let X,,X2,... be tid random variables, with P{X, =i}=1/r,i= 

Q,1,...,7—1, and define X = eer r ”X,. Thus X is the number in 

[0, 1] with r-adic expansion .X,X2---. 

(a) Show that X is uniformly distributed, in other words, Py is Lebesgue 
measure. 

(b) Show that for almost every x € [0, 1] (Lebesgue measure), the fol- 
lowing condition holds: 


For every r = 2, 3, ...and all i = 0, 1,...,r — 1, the relative fre- 
quency of i in the first n digits of the r-adic expansion of x converges 
to l/r as n > œ. 
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(c) If x € [0,1], let R, (x) = 2x, —1, where x, is the nth digit of the 
binary expansion of x (to avoid ambiguity, eliminate expansions with 
only a finite number of zeros). The R,, are called the Rademacher 


functions. Use part (a) to show that h R, (x) dx = 0, and 


0, nm 
| ROR nm 
6.3 MARTINGALES 

Probability theory has its roots in games of chance, and it is often prof- 
itable to interpret results in terms of a gambling situation. For example, if 
X,,X2,...18 a sequence of random variables, we may think of X,, as our total 
winnings after n trials in a succession of games. Having survived the first 
n trials, our expected fortune after trial n + 1 is E(X,4;|X1,...,X,). Tf this 
equals X,, the game is “fair” since the expected gain on trial n + 1 is E(X,4, 
—X,,|X,,...,Xn)=Xy —X, = 0. If E(Xy,4)|X1,...,X,)=X,, the game is 
“favorable,” and if E(X,4;|X,,..., Xn) < Xn, the game is “unfavorable.” 

We are going to study sequences of this type; the results to be obtained will 
have significance outside the casino as well as inside. 


6.3.1 Definitions. Let (Q,.¥,P) be a probability space, {X,, X2,...} a se- 
quence of integrable random variables on (Q, Z, P), and Fi Ca C--- an 
increasing sequence of sub o-fields of .¥;X,, is assumed .¥,,-measurable, that 
is, Xn: (Q, Fa) > (IR, 4(R)). The sequence {X,,} is said to be a martingale 
relative to the F, (alternatively, we say that {X,,, .¥,,} is a martingale) iff for all 
n=1,2,...,E(Xn41|.4%,) = Xn a.e., a submartingale iff E(X,+,|Fn) = X} 
a.e., a supermartingale iff E(Xn+1|4%,) < Xn a.e. (In statements involving 
conditional expectations, the “a.e.” 1s always understood and will usually be 
omitted.) 


Let {.¥,,} be a decreasing sequence of sub o-fields of 7, with X, assumed 
¥,-measurable. If E(X,,|.¥%,41:) = Xn41, we say that {X,,.4%,} is a reverse 
martingale. Similarly, E(X, | Fn+1) > Xn+41 defines a reverse submartingale, 
and F(X,,|.%,4;) < Xn+; defines a reverse supermartingale. 


6.3.2 Comments. (a) If {X„, Fn} is a martingale, then 
E(Xn4klFn) =Xn, n, K=1,2,... 
(with corresponding statements for sub- and supermartingales). For 
E(Xn42|¥n) = ELE X naal Fn Fn] by 5.5.10(a) 


— E(Xn41|Fn) 
= X,. 
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The general statement follows by induction. 
(b) If {X,,.¥,,} 1s a martingale, then 


E(Xnii|X1,---,Xn)=Xn,  n=1,2,... 


Thus {X,,} is automatically a martingale relative to the standard o-fields 
o(X,,...,X,) (with corresponding statements for sub- and supermartingales). 

For .¥; C---C.¥%,, and thus X,,...,xX, are all .¥,,-measurable. Since 
o({X,;,...,X,) is the smallest o-field making X,,...,X, measurable [see 
5.4.2(b)], we have o({X,,...,X,) C Fn. If, in the defining relation 


E(Xn+1|Fn) = Xn, 


we take conditional expectations with respect to o(X,,..., X,,), we obtain the 
desired result by 5.5.10(a) and 5.5.11 (a). 

If we say that {X,,} is a martingale (or sub-, supermartingale) without men- 
tioning the o-fields .¥,,, we shall always mean p = o(X,,..., Xn), so that 
E(Xn4i1X1, ses Xn) — Xp. 

(c) {Xn, Fn} 1s a martingale iff 

f Xap = | Xn dP for all AEF, n=l,2,.... 
A A 
This follows since the condition E(X,,.;|.4%,) = X, a.e. [P] is equivalent to 
| EXP = | x, dP for all AEF, 
A A 


(see 1.6.11); also 
f EXP = | Xas dP 
A A 


by definition of conditional expectation. 
Similarly, {X„, Fp} is a submartingale iff 


[Xap < | XasidP for all AE Y¥,, 
A A 
and a supermartingale iff 
fx; dP > lear dP for all A E Fa. 
A A 


In particular, E(X, ) is constant in a martingale, increases in a submartingale, 
and decreases in a supermartingale. 
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(d) The defining condition for a martingale relative to the o-fields 
o(X,,...,X,) is equivalent to 


E(Xn41 |X, = X,-.-,Xn = Xn) = Xp ae. [Pixs Xn) 


with similar statements for sub- and supermartingales. 
For if A €o(X;,...,X a) then A is of the form {(X;,...,X,) E B}, 
B e B(R”) (see 5.4.1). If X = (X,,..., Xn), then 


A {X EB} 


= | EX lX, = X], -. -3 Xn =xX,)dPy by 5.3.3 
B 
and 
[Xap = | EXX; = X1,...,Xy = x, )dPy 
A B 
= | x, dPy by 5.5.11 (a’). 
B 


The result now follows from (c). 


(e) A finite sequence {X;,.4%,,k =1,...,n} is called a martingale iff 
E(X fk) = Xk, K=1,2,...,n— 1; finite sub- and supermartingale se- 
quences are defined similarly. 

(f) If {X,, fn} and {Y,,.%,,} are submartingales, so is {max(X,,, Yn), Fy}. 


For E(max(Xy41; Yny Dl Fn) = E(Xnsi1|F%,) = Xn, and similarly 
E(max(Xn+1, Yaoi )Fn) = Yn. 


The same approach shows that if {X,,.¥,} and {Y,,.%,} are supermartin- 
gales, so is {min(X,, Yp), Fy}. 


6.3.3 Examples. If X, = X, then {X,} is a martingale; if X; < X2 <--., 
then {X,,} is a submartingale; if X; > X2 > ---, then {X,,} is a supermartingale 
(assuming all random variables integrable). 

We give some more substantial examples. 


(a) Let Y, Y2,...be independent random variables with zero mean, and 
set Xn = X; Yu, Fn = 0(Y;,..., Yn). Then {X,,,.%,} is a martingale. For 
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E(Xn41|Fn) = E(Xn + Yaşıl Yı, e.s Yn) 
=X, + E(Yn41| Y1, <., Yn) 
since X, 1s .¥,-measurable 
=X, + E(¥n4+1) 
by independence (Problem 1, Section 5.5) 
=X, 


since E(Y;) = 0. 


(b) Let Y,, Y2,...be independent random variables with E(Y;) = a; Æ 0, 
and set X, = | Jj- (¥;/a;), Fn = o(%1,..., Yn). Then {X,, Fn} is a martin- 


gale. For 
Xa Yn 
E(Xn41|Fn) =E (=) Vises, 3 


Yn 
—X,E ($=) 
Gn+} 
by 5.5.11(a) and Problem 1, Section 5.5 
= X,. 
(c) Let Y be an integrable random variable on (Q, F, P). 


If {¥,,} is an increasing sequence of sub o-fields of ¥, and X, = E(Y|F,,), 
then {X,, Fn} is a martingale. For 


E (Xni Fn) = ELE |Fasi Fa] 
= E(Y|F,) since Fy CF yy 
= X,,. 


If {¥,} is a decreasing sequence of sub o-fields of Z, and X, = E(Y|F,,), 
then {X,,, Fn} is a reverse martingale. For 


E(Xn|Fn4i) = ELEAF Frnt] 
=E(Y|Fnyi) since Fry Cn 


== X n+l- 


Note that as in 6.3.2(a), E (Xn lnk) = Xnak, n, k = 1,2,.... 


(d) (Branching Processes) We define a Markov chain (see 4.11) with state 
space S = {0, 1,2,...}. The state at time n, denoted by X,,, is to repre- 
sent the number of offspring after n generations. We take Xo = 1, and, if 
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Xn = k, Xn+1 is the sum of k independent, identically distributed, nonnega- 
tive integer valued random variables, say Y;,..., Yg, where P{Y; = 1} = py, 
Į = 0, 1,2,.... Thus p; is the probability that a given being will produce 
exactly / offspring. (Formally, we take pg; = P{Y; +- +Y =JjhL k=l, 
2,...,J/ =), l,..-5 Poo = l.) 

Let m = E(Y;) = X 2o L pı. If m is finite and greater than 0, then {X,,/m”} 
is a martingale [relative to 


X, X Xn 
Fy = O10. Xn) =o (T ce of 


m? me me 
For 
X, 0° . 
1] Xo =i = in) = Do Pisin 
(os Xo =10,...,Xyn = ln | = Pind nti 
0 


(see 5.3.5(a), Eq. (2) and the 
definition of a Markov chain) 


] OO 
—— So JPY ++ Yn =f) 
j=0 


meiti 


l 
= E(Y;+-°:+Yi,) 


My + | 
i m Ly 
7 miti m” 


The result now follows from 6.3.2(d). 
(e) Consider the branching process of part (d). Let g(s) = 7, ps’, s > 0. 


If for some r we have g(r) =r, then {r*"} is a martingale relative to the 
o-fields .¥ (Xo, ..., Xp). For as in (d), 


OO 
X, o; ey L 
E(r HIXo = i9,...,Xn = in) = ÙX pi, jr? 
= 


OO 
=SoriP(¥i ++ Yi, = j} 


j=0 
= Elexp,(¥1 +++ + Ya) 
= [Ei = go =r". 


If {X,,.4%,} is a martingale, what can we say about X,,* or |X,|? In order 
to answer this question, we need some basic convexity theorems. 


6.3 MARTINGALES 253 


6.3.4 Line of Support Theorem. Let g: I — R, where / is an open interval 
of reals, bounded or unbounded. Assume g is convex, that is, 


g(ax + (1 — a)y) < ag(x) + (1 — a)g(y) 


for all x, y € J and all a € [0,1]. Then there are sequences {a,,} and {b„} of 
real numbers such that for all y € J we have g(y) = sup, (a, y + bn). 


Proor. In the course of the proof, we develop many of the basic properties 
of convex functions of one variable. 

Let g be a convex function from 7 to R,Z an open interval of reals. If 
0 < hı < hy, then by convexity, 


g(x th,) =< (h ee 1) g(x) + jet hn) 
hence 
7, 8 + hy) — g(x)] < y 8 + hz) — g(x)] (1) 
l 2 
and i i 
— y 8 — hz) — g(x) < -y ls@ — hy) — g(x). (2) 
2 l 
Also, if h, h’ > 0, 
g(x) < g(x —h)+ ’ g(x +h’), 
_h+h’ h+h’ 


SO 

g(x — h) — g(x) < g(x +h’) — B(x) 3) 

—h h 

By (1) and (2), the right and left derivatives g,° and g_’ exist on I, by (3) they 
are finite. [Note that if x € J, we have x — h,x +k eI for small h, W > 0 
since 7 is open; thus, in fact, the difference quotients [g(x + W) — g(x)]/h’, 
which decrease as œ | 0 by (1), are bounded below by a finite constant. ] 
Furthermore, by (1) and (2), 


inf g(y)— g(x) g'(x) = sup g(x) — g(y) (4) 


g+ (x) = 
y—xX ya XY 


and by (3), 
g- x) < g+ (x). (5) 
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If x; < x2, then 


(x2) — g(x1) (x1) — g(x2) 
8+ (1) < seen L eee < g- (x2); (6) 
X? — X] X| — X? 

hence g,’ and g_’ are increasing on I. The existence of right and left deriva- 
tives implies that g has right and left limits at each point; thus g is continuous; 
for if not, g.’ or g_’ would be infinite at some point. If y > x, then g(y) 
> g(x) + (y — x)g+' (x), and if y < x, then g(y) > g(x) + (y — x)g—' (x) [by 
(4)]. We conclude that if g_’(x) < a, < g+ (x), then 


ge(y) = 8) + O — xax for all yel. (7) 


The function L, given by L(y) = g(x) + (y — x)a,, y € I, is called a line of 
support for g at the point x. It is immediate that g(x) = sup{L; (x): 
s € I} [since L, (x) = g(x)], but we are trying to prove that g is the sup of 
countably many lines of support. If x € J, let r approach x through a sequence 
of rational numbers. Then 


L(x) = g(r) + & — r)a,. 


But g_’(r) <a, < g.'(r), and the g’, being increasing functions, are bounded 
on any finite closed subinterval of 7, so that the a, form a bounded sequence. 
Consequently, L,(x) > g(x). By (7), L-Cy) < g(y) for all y € I and all r; hence 
g(x) = sup{L,(x): s € I, s rational}, that is, g(x) = sup{e(s) + @ — s)as: sE I, 
s rational}. The proof is complete. C 


If J is not open, the theorem is false: consider g(x) = 0,0 < x < 1, g(1)= 1. 

Now if X is a random variable with finite mean, we have seen that 
E(X?) > [E(X)]? (4.10.6). This is a special case of the following general 
convexity theorem. 


6.3.5 Jensen’s Inequality. Let g be a convex function from I to R, where 
I is an open interval of reals, bounded or unbounded. Let X be a random 
variable on (Q,.4, P), with X(@w) € I for all œ. Assume E(X) to be finite. If 
H is a sub o-field of F, then E[g(X)|4 | > g[E(X|H%)] a.e. In particular, 
Elg(X)] > glE(X)]. 


Proof. First note that E (X|) € I a.e. For if, say, a is real and X > a, then 
E(X|.A%) > a a.e. because 


o> | EX- al %)aP = | (X — a) dP > Q; 
{E(X|.#)<a} {E(X|.4#) <a} 
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hence X = a a.e. on {E(X| 3>) < a}, which implies that P{E (X|) < a} = 0. 
Thus 2[E(X|4%)] is well-defined. 

By 6.3.4 we may write g(y) = sup, (a,y + bn), y € I, so g(X) > a,X +b, 
for all n. Therefore E[e(X)|.4] > a,E(X|A) + bn a.e. Take the sup over n 
to finish the proof. LI 


The proof of 6.3.5 shows that the hypothesis that E(X) is finite may be 
dropped if it is known that E(X) and E[g(X)] exist, and E(X|4) € I ae. 

We are now able to answer the question raised earlier about X,,* and |X,,| 
when {X,,, fn} 1s a martingale. 


6.3.6 Theorem. (a) Let {X,„, n} be a submartingale, g a convex, increas- 
ing function from R to R. If g(X,,) is integrable for all n, then {g(X,, ), Fn} is 
a submartingale. Thus, for example, if {X,,} is a submartingale, so is {X,,~}. 

(b) Let {X,,.¥,} be a martingale, g a convex function from R to R. If 
2(X,,) is integrable for all n, then {g(X,,),.%,} is a submartingale. Thus if 
r > 1, {X,,} is a martingale and |X,,|’ is integrable for all n, then {|X,|'} is a 
submartingale. 


Proor. We have E[g(X,+1)|4nl > glE(Xn+1|4%,)] by Jensen’s inequality. In 
(a), E(Xn+1|F%,) = Xn by the submartingale property; hence glE (Xn lAn] 
> 9(X,,) since g is increasing. In b), E(X,+1|.4%,) =X, by the martingale 
property, so glE (Xn n+ ln) = g(Xn). The result follows. L 


Problems 


l. Let X, = `; Yz, where the Y; are independent, with P{Y, = 1} = p, 
P{Y; = -l} =q (p,q > 0, p+q= 1). Show that {(¢/p)**} is a mar- 
tingale relative to the o-fields o(X1,...,X,)[=o0(%1,..., Yn). 

2. Consider of Markov chain whose state space is the integers, and assume 
that p;; depends only on the difference between j and i: p;; = q;-i, 
where qg > 0,55, q = 1. 

(a) Show that if X,, is the state at time n, X, may be written as Xo 
+Y,+---+Y,, where Xo, Y,,..., Y, are independent and the Y; 
all have the same distribution, namely, P{Y; = k} = qx, k an integer. 


(b) If}, q;r’ = 1 (where the series is assumed to converge absolutely), 
show that {r*"} is a martingale relative to the o-fields o(Xo, ..., Xn). 


3. Let A be a countably additive set function on the o-field Z, and let Z, 
be generated by the sets A,,An2,..., assumed to form a partition of Q, 
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with P(A,;) > O for all j. Assume that the (n + 1)st partition refines the 
nth, so that ¥, C.F,41. 


Define 


Show that {X,,,.%,,} 1s a martingale. 


4. Define a sequence of random variables as follows. Let X; be uniformly 
distributed between 0 and 1. Given that X; =x), X> =%,...,X,-| = 
X,—1, let X, be uniformly distributed between 0 and x,,_;. Show that {X,,} 
is a supermartingale and E(X,,) = 2~". Conclude that X, — O ae. 


5. Let Xi, X2,... be real-valued Borel measurable functions on (Q, F). 
Assume that under the probability measure P on .¥, (X),...,X,,) has 
density p,, and under the probability measure Q on.¥, (X,,...,X,,) has 
density g,. Define 


n X 9 ee 8 4 Xn b a b 
InKo- Xn) if the denominator is greater than 0, 
Y, = Pu(X1,---,Xn) 

0 otherwise. 


Show that if 2n, = F (X1,..., Xn), {Yn Fn} is a supermartingale on 
(Q, F, P) and 0 < E(Y,,) < 1 forall n. 


6. Let {X n, n, n > 0} be a supermartingale, and define 


Yo = Xo, 
Yı = Yo + (X1 — E(Xi/-%o)), 


Yn — Yn- + (Xn — E(Xn Fa-1)). 
Also define 


Áo = 0, 
A; = Xo — E(X,|%o), 
A2 =A; + (X, — E(X2|F1)), 


An — An-1 + (Xn-1 — E(X, Fa-1)). 
(a) Show that X, = Y„ —A,. 


(b) Show that {Y,„, Fn} is a martingale. 
(c) Show that for a.e. œ, A, (œ) increases with n. 
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Thus a supermartingale can be expressed as the difference between a 
martingale and an increasing sequence. (Similarly, a submartingale can 
be expressed as the sum of a martingale and an increasing sequence.) 


6.4 MARTINGALE CONVERGENCE THEOREMS 
Under rather mild conditions, sub- and supermartingales converge almost 
everywhere. This result has very many ramifications in probability theory. 
We first prove a theorem which has an interesting gambling interpretation. 


6.4.1 Optional Skipping Theorem (Halmos). Let {X,,,.¥,,} be a submartin- 
gale. Let £1, €2,...be random variables defined by 


fl if Xi Xe) EB 
“\O if (&,.-., Xk) Æ Bk, 


where the B} are arbitrary sets in .#(R”). Set 


Yi =X, 
Yo =X, + €)(X2 —X1) 


Yn = X] + €\(X2 — Xı) + n. + En- (Xn — Xn-1). 


Then {Y , 7,} 1s also a submartingale and E(Y,,) < E(X,„) for all n. If {X„, Fn} 
is a martingale, so is {Y,,,.%,} and E(Y,) = E(X,,) for all n. 


Interpretation. Let X, be the gambler’s fortune after n trials; then Y, 
is our fortune if we follow an optional skipping strategy. After observing 
X\,...,Xx, we may choose to bet with the gambler at trial k + 1 [in this case 
Ek = &(X,...,X,) = 1] or we may pass (€, = 0). Our gain on trial k+ 1 
is €,(X,., — X;). The theorem states that whatever strategy we employ, if 
the game is initially “fair” (a martingale) or “favorable” (a submartingale), 
it remains fair (or favorable), and no strategy of this type can increase the 
expected winning. 


PROOF, 
E(Yn41 Fn) — EY) + En (Xn +1 7 Xn )|Fn) 
— Yn + En EU Xn+1 — Xn NF n] 
since £, 1s a Borel measurable function of X,,...,X,, and hence is 


o(X),...,Xn) C ¥,-measurable. 
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Therefore 


E(Y,.41b¥n) = Yn + &n(Xn — Xn) = Yn in the martingale case 
> Y, +é,(X%, —X,) = Y, in the submartingale case. 


Since Y; = X,, we have E(X,)=E(Y,). Having shown E(X; — Y;,) > 0 
(= 0 in the martingale case), 


Xk+1 — Very = Xk41 — Yk — €g (X41 — Xk) 
= (1 — £kXX k1 — Xe) + Xr — Yı. 


Thus 


E(X x41 — Yki lF) = A — EDE(X k1 — Xl Ik) + EX — Yil) 
> E(X; — Yx| 4p) = Xk — Yk, 


with equality in the martingale case. Take expectations and use E[E(X|# )] 
= E(X) to obtain 


E(Xk+1 — Yk+1) = E(X; — Y) = 9, 
with equality in the martingale case. LI 


The key step in the development is the following result, due in its original 
form to Doob (1940). 


6.4.2 Upcrossing Theorem. Let {X}, Fy, k = 1,2,...,n} be a submartin- 
gale. If a and b are real numbers, with a < b, let Uap be the number of 
upcrossings of (a, b) by X),..., Xn, defined as follows. 

Let Tı = T,(@) be the first integer in {1,2,...,} such that X7, <a, T2 
be the first integer greater than Tı such that Xr, > b, T3 be the first integer 
greater than Tz such that Xy, < a, T4 be the first integer greater than T3 such 
that Xr, > b, and so on. (Set T; = œo if the condition cannot be satisfied.) If 
N is the number of finite, 7;, define Uap = N/2 1f N is even, and (N — 1)/2 
if N is odd. Then 


E(Uap) < | FE, —a)"]. 
b-a 


Proof. First assume a = Q, and all X; > 0. Define the T; as above (Xr, <a 
is now equivalent to X7, = 0. Let €; = 0 for j < Ti; €; = l for Ti < j < Th; 
€;=0 for T2< j< T3;£;=1 for T3 <j<T4, and so on (see 
Fig. 6.4.1). 
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Ò Ò 0 


nr E 
123456 7 8 9 #1011 12 13 14 15 
Figure 6.4.1. 


In Figure 6.4.1, we have (with n = 15) Ti = 4, Tə = 8, T3 = 10, T4 = 11, 
Ts = 14, T; = 0O, N > 5, Uap = 2; 


€) = & = £, = 0, E4 = £5 = & = £7 = 1, Eg = £ọ = O, 


€10 = 1, €11 = £12 = £13 = Q, €14 = l; 


X, +E£1(X2 — X) +- + £14(X15 — X14) 
=X, + Xg — X4 + Xi — Xio + X15 — X14. 


Note that Y„, as defined in 6.4.1, is the total increase during upcrossings, 
plus possibly a “partial upcrossing” at the end, plus a contribution due to 
X, (necessarily nonnegative). Thus Y, > bU. But the £; can be expressed in 
terms of X;,...,X;, so the optional skipping theorem applies; hence {Y}, x, 
k=1,2,...,n} 1s a submartingale, and E(Y,,) < E(X,,). Thus 


l l 
EWU ay) < FEU) S pE Xn), 


as asserted. 

In general, {(X; —a)*, A, k = 1,2,..., n} is a submartingale by 6.3.6(a), 
and the number of upcrossings of (a, b) by {X;} 1s the same as the number 
of upcrossings of (0, b — a) by {(X; — a)” } (note that X; < a, X; — a < 0, and 
(X;-—a)* <0 are equivalent, as are X;>b,X;-a>b-—a, and 
(X; —a)™ > b ~a. The result follows from the above argument. C 


We now prove the main convergence theorem. 


6.43 Submartingale Convergence Theorem. Let {X,, ¥,,n = 1, 2,...} be 
a submartingale. If sup, E(X,,~) < œo, there is an integrable random variable 
Xo. such that X,, > X» almost everywhere. 


Proor. P{w: X, (æ) does not converge to a finite or infinite limit} 


P U fo: lim inf X,(@) < a < b < limsup X, (w)} , 
a<b Fi OO HOO 
a,b rational 
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If for some a < b, P{liminf,_,.~ X, <a < b < limsup, _,., Xn} > 0, then 
{X,,} has an infinite number of upcrossings of (a,b) on a set of positive 
probability; hence E(U,,) = oo. But Uap is the limit of the monotone se- 
quence U,,., = the number of upcrossings of (a, b) by X;,...,X,, so that 
E(U gp.n) > E(Uap). But by 6.4.2, 


E(Uapn) < (b — a)'E[(X, — a)*] < (b — a) ' [sup E(X„ t) +a] < =, 


a contradiction. Thus X, converges to a limit X,, a.e. Now |X,|=X,~ 
+X, =2X,* — Xn, and E(X,,) > E(X;) by the submartingale property. 
Therefore 
E(|Xn|) < 2sup E(X,,~) — E(X1) < œ. 


By Fatou’s lemma, 
E(|Xoo|) < lim inf E(|X,|) < 00 


hence X% 1s integrable, and therefore finite a.e. By changing X» on a set of 
measure 0, if necessary, we may take X% as a random variable (rather than 
an extended random variable). The theorem is proved. [O 


6.4.4 Corollary. Let {X,,.%,,n = 1, 2,...} be a reverse submartingale [the 
Fa decrease as n increases, and E(X, |.%,.1) > Xn+, ae.]. If inf, ECX,) 
> —oo, there is an integrable random variable Xæ such that X, —> Xo a.e. 
[Note that the hypothesis is satisfied for any reverse martingale since E(X, ) 
is constant. | 


Proor. Proceed as in 6.4.3, but instead let U,,.,, be the number of upcrossings 
of (a, b) by {X,, X,-1,..-,X 1}, which is a submartingale because 


E(X;|Xk+1,.--,Xn) = ELE (Xk | Fk) Xk+, Xnl > Xx. 


We obtain E(Ugy.,) < (b — a) 'E[(X, — a)*] < œœ, and thus X, > X% ae. 
as before. 

Now |X,,| = 2X, 7 —X,, and E(X,,) > inf, E(X,,) > —oo. Also {X,,~,..., 
Xıt} is a submartingale by 6.3.6(a), so E(X,*) < E(X,~). Thus E(|X,,|) 
< 2E(X,~) — inf, E(X,) < œ, so X% is integrable by Fatou’s lemma as be- 
fore. O 


6.4.5 Comments. (a) In 6.4.3 and 6.4.4, the proofs show that {X, } must be 
L! bounded, that is, sup, E(|X,|) < oo. Thus for a submartingale, sup, E(X,~) 
< oo is equivalent to L! boundedness, and implies convergence. However, 
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a submartingale may converge without being L! bounded (see Problems 1 
and 2). 

(b) Results analogous to 6.4.3 and 6.4.4 hold for supermartingales: If 
{Xn Fn n =1,2,...} 18 a supermartingale and sup, E(X,,) < co, then there 
is an integrable random variable X such that X„, —> X» a.e. In particular, 
a nonnegative supermartingale converges a.e. If {X,,.4%,,n =1,2,...} 1s a 
reverse supermartingale and sup, E(X,,) < oo, there is an integrable random 
variable Xœ such that X, —> X~ a.e. The first statement follows from 6.4.3 
since {—X,,, Fn} is a submartingale and sup, E[(—X,,)*] = sup, E(X„ ). The 
second follows from 6.4.4 since {—X,,,.¥,} 1s a reverse submartingale and 
inf, E(—X,) = — sup, E(X, ). 


Problems 


1. Consider the following Markov chain. Take X; = 0. If X,, = 0 (regardless 
of X}, k < n), then: 


Xn+1 = An+| with probability Pn41 
= —An+1 with probability Pn+l 
= 0 with probability l — 2 pnt, 


where 0 < Ph+] < i and the a, are distinct and greater than 0. If X„ Æ 0, 


take X„+1 = Xn (thus if X, #0, we have X; = X, for all j > n). 
(a) Show that {X,,} is a martingale, and X,, converges everywhere. 
(b) Ifẹ; pk < co and 327°, ax py = 00, show that sup, E(|X,|) = oo. 


(Problem by W. F. Stout, personal communication.) Consider the follow- 
ing Markov chain. Take Xo = 0, and let 


Da 


PAX n+ =n + l| Xn = n} = Pn+l; 
P{Xn+1 = —(n + 1)[X, = n} = 1 — posi, 


P{Xn+1 = —k|X, = —k} = 1, (n=0,1,..., k=1,2,...). 
(a) Show that if p,.; = (2n + 1)/@n + 2) for all n, then {X,,} is a 
martingale. 


(b) If the p, are chosen as in (a), show that the martingale converges 
a.e. to a finite limit, although E(|X,,|) > oo. 


(Note: In Problems 1 and 2, the Markov chain has nonstationary transition 
probabilities, in other words, the probability of moving from state į at time 
n to state 7 at time n + 1 depends on n. However, the basic construction of 
4.11.2 carries over.) 
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3. (Kemeny, Snell, and Knapp, 1966) Let {X,} be a Markov chain with 
state space S = the set of rationals in (0, 1), and the following transition 
probabilities: 

Let 0 < b <a < 1, a,b rational. If x € S and X, = x, then X,,., = bx 
with probability 1 — x, and X,.; = bx + 1 — a with probability x. 
(a) If a= b, show that {X,,} is a martingale, and X, —> X% a.e., where 
Xo = Qor 1; also PIX% = 1} = E(Xo). 
(b) If b < a, show that {X,,} is a supermartingale and X,, > 0 ae. 


4 (Polya urn scheme) An urn contains white and black balls; one ball is 
drawn, and then replaced by two of the same color, and the process is 
repeated. Thus if the urn contains c white and r — c black balls, and a 
white ball is drawn, the fraction of white balls in the urn before the next 
drawing is (c+ 1)/(r + 1). 

If X, is the fraction of white balls in the urn just before the nth drawing, 
show that {X,,} is a martingale, and X, converges a.e. to a limit Xx, where 
E(X) = E(X1). 

5. Martingales may be defined on a measure space (Q,.¥, w) if u is finite; 
simply replace u by the probability measure P = p2/[(Q2)]. If a(R) = oo 
we can use 6.3.2(c) in the definition: f, X, du = f, Xn41 du forall A € F, 
(of course, the X,, are still required to be .¥,,-measurable and integrable). 
Sub- and supermartingales may be defined similarly. 


Show that an L! bounded submartingale converges a.e. [u] to an inte- 
grable limit. 


6.5 UNIFORM INTEGRABILITY 
We now introduce a concept that has important application to martingale 
theory, and in fact to integration theory in general. 


6.5.1 Definitions and Comments. Let fı, f2, ...be real- or complex-valued 
Borel measurable functions on the measure space (Q, Z, u), u finite. The f, 
are said to be uniformly integrable iff 


J lfn|du — 0 as c —> œ 
ilfnl=c} 


uniformly in n. (The definition is the same for an uncountable family { f;}.) 

It is immediate that if the f, are uniformly integrable, each f, 1s integrable. 
Also, if | f,| < g for all n, where g is integrable, in particular, if the fn are 
uniformly bounded, then the f,, are uniformly integrable. 
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Furthermore, if the f,, are uniformly integrable, then sup, fe |fn|du < œœ, 
because if £ > 0, 


f tuldu = | faldu | Jnl du < € + cp(82) 
Q {| fn|=c} (falc) 


for large n. 
One basic application of uniform integrability is the following extension of 
Fatou’s lemma and the dominated convergence theorem. 


6.5.2 Theorem. Let fi, fo,...be real-valued and uniformly integrable. 


(a) | Gimin f, ya < liminf | fn dp 
Qo 7 n Q 


< limsup / fandu < f cim sup f,,) du. 
n Q O n 


(b) If f, — f a.e. or in measure, then f is integrable and 


| fodum | fdu 


PROOF. (a) We have 


| frdu= | frdu+ | fn dp, c> 0, 
2 (fn <—c} (fn =—c} 


By uniform integrability, c may be chosen so large that | J fce} fn dp 
< £ for all n, where £ > Q is preassigned. Since f„lif,>-e} = —c, which is 
integrable since u is finite, Fatou’s lemma yields 


liminf / fa du > J lim inf( falif >o) dh. 
{fne} Qo P 


n 


Since filty,>—c) = fn, this integral is in turn greater than or equal to 
fa dim inf, fn)du. Thus 


lim inf J f, du > J (lim inf fa) du — e, 
n Q O n 


proving the lim inf part. The lim sup part is done by a symmetrical argument. 
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(b) This is immediate from (a) if f, —> f a.e., so assume f, — f in 
measure. By 2.5.3, there is a subsequence f,,, > f a.e.; hence by (a) applied 
to the f,,, f is integrable and fe fn, du —> fa f du. If f fr du does not 
converge to fa f du, then for some £ > 0 we have | fo fn du — fa f dul > € 
for infinitely many n, and for convenience in notation we may assume this 
holds for all n. But then we find a subsequence fm, > f a.e., and as above, 
Jo fm, du > fa f du, a contradiction. C 


We now establish a useful criterion for uniform integrability. 


6.5.3 Theorem. The complex-valued Borel measurable functions f,, are uni- 
formly integrable iff the integrals fe |fn|du are uniformly bounded and also 
uniformly continuous, that is, f, |f,|du— 0 as w(A) > 0, uniformly in n. 


Proof. Assume the integrals are uniformly bounded and uniformly continu- 
ous. Then 


l 
wlfa| 2c} < 3) fal du 
C JR 


by Chebyshev’s inequality, and this approaches 0 as c > co, uniformly in n, 
by the uniform boundedness. Thus f, f.jocy |fnl du > 0 as c > co, uniformly 
in n, by the uniform continuity. 

Conversely, assume uniform integrability. We have 


[\tuldu = | fold + | fel du 
A AM | fnl=c} AN fnl<ce} 


< J fal die + cu(A). (1) 
(lL fal=c} 


Choose c so that Ju pisey lfnldu < e/2 for all n; if (A) < €/2c, then by (1), 
f a fanl du < (€/2) + (e/2) = e for all n, proving uniform continuity. Uniform 
boundedness was verified in 6.5.1. O 


We have seen in 2.5.1 that L? convergence implies convergence in measure. 
The converse holds under an additional hypothesis of uniform integrability. 


6.5.4 Theorem. Let u be a finite measure on (92,.% ), and let 0 < p < œ. 
If fa ——> f and the |f,|? are uniformly integrable, then f, — f. 


Proor. First assume that the |f, — f |P are uniformly integrable. By 2.5.3, 
there is a subsequence { f,,,} converging to f a.e. and in measure. By 6.5.2(b), 
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fo fn, — f |P du — 0 as k > ov. The same argument shows that any subse- 


quence of { f,,} has a subsequence converging to f in LP. Hence f, —— f, 
for if not, there would be an ¢ >0O and a subsequence {/f,,} such that 
falfa — f|? du > e for all i. 

Now assume the | f,,|? to be uniformly integrable. Then | fa — f |P < |f,|? 
+|fl? if p <1, and |f, — fl? <2? 'Uf, 7 +If|P) if p> 1. (See 2.4.6 
and the end of 2.4.12.) As above, we have a subsequence f,, > f a.e. By 
6.5.2(b), |f |? is integrable, and it follows that the integrals fe |fn — f|? du 
are uniformly bounded and uniformly continuous. [Note that f, |f|? du > 0 
as (A) — 0 by 2.2.5(e).] By 6.5.3, the |f, — f|? are uniformly integrable, 
and the previous argument applies. C 


| 
6.5.5 Corollary. In 6.5.2(b), fa ——> f, that is, fo |fn — fldu > 0. 


Proof. The |f,,| are uniformly integrable by hypothesis, and f, —., f 
either by hypothesis or by 2.5.5 C 


The following result will be needed in the next section. 


6.5.6 Lemma. Let f;, icf, be integrable functions on (Q,.¥, w). If 
h: [0,00) — [0,00) is Borel measurable, h(t)/tf-—- œ as t > oo, and 
supjc; fo AC fil)du < oo, then the f; are uniformly integrable. For example, 
take A|t| = t”, p > 1; thus if fo |fn| du < M < œ for all n, then the f, are 
uniformly integrable. 


Proor. Given £ > 0, let M = sup;.; fo ACI fil) du and set a = M/e. There is 
a positive number c such that h(t)/t > a for t > c. Thus 


(| fil) 
J fldu < J US) ay 
(Lf l=e} i fil=e} a 


AC fi M 
< | ws He. [C] 
Q a 


— - a 
Problems 
LP 
l. IfO< p<, f, fi, f2... € LP, and fı —> f, show that the | f,|? 


are uniformly integrable. 


2. Give an example of a uniformly integrable sequence of functions f, ona 
measure space (Q, Z, u), u finite, such that the | f,,| cannot be bounded 
above by an integrable function. 
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6.6 UNIFORM INTEGRABILITY AND MARTINGALE THEORY 

The application of the uniform integrability concept yields many addi- 
tional facts about martingales. In particular, we shall find that a martingale 
is uniformly integrable iff it can be represented as X,, = E(Y|.¥,,) for some 
integrable random variable Y. Thus Y can be considered a “last element” of 
the martingale {X,,, Fn}. 


6.6.1 Lemma. Let Y be an integrable random variable on (Q, Z, P) and 
let ¥;, i EJ, arbitrary sub o-fields of “#. Then the random variables 
Xi = E(Y|%;), i € I, are uniformly integrable, that is, 


J |X;|dP — 0 as c> œ 
(IX;l>c]} 


uniformly in 1. 


Proor. As |E(Y|Y%)| < EC Y||%), 


J xildP < | EUYIAAP = | IY | dP 
(|X; |=c} ([X,[=c} {}X;{=c} 


since {|X;| > c} € ¥; (remember X; is ;-measurable). But by Chebyshev’s 
inequality, 


P{|X;| > c} < E(X: < c'E[E(Y||A)] =c 'ECY|) > 0 
as Ca we 


uniformly ini. L 


The following result, due to Lévy (1937), was historically the first of the 
martingale convergence theorems. 


6.6.2 Theorem. Let {¥,,} be an increasing sequence of sub o-fields of Z, 
and let Fa be the o-field generated by J, Zn. If Y is integrable and X, 
= E(Y|¥,), n = 1,2,..., then X, > E(Y|¥Q) ae. and in L!. 


Proor. {Xn, fn} 18 a martingale by 6.3.3(c), and is uniformly integrable 
by 6.6.1. Since E(|X,,|) < E(|Y|) < oo, X, converges a.e. to an integrable ran- 
dom variable Xo, by 6.4.3; L! convergence follows from 6.5.5. It remains to 
show that X% = EY | Fx) ae. 

If A € Fn, then 


[var = | EOZ aP = | x,aP > | Xd? 
A A A A 
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by L! convergence. Thus Í a Y dP = f, Xoo dP for all A in the field [JFL] 2, 
and hence for all A € ~ (monotone class theorem). Since X, 1s F, C Fa- 
measurable, X% 1s .%5.-measurable, and therefore Xx = E(Y |æ) ae. by 
definition of conditional expectation. C 


A result similar to 6.6.2 holds if the o-fields form a decreasing sequence. 


6.6.3 Theorem. Let {¥,} be a decreasing sequence of sub o-fields of .¥, 
and let Fo = L] Fn. If Y is integrable and X, = E(Y|%,), n = 1, 2,..., 
then X, > E(Y|.F%,.) a.e. and in L!. 


Proor. Just as in 6.6.2 (using 6.4.4 instead of 6.4.3), Xa, > Xoo a.e. and in 
L', so we must show that X,, = EO |F) ae. If A € Fa C F,, then 


| vap= | EYF ar = f x, dP > | XooaP. 
A A A A 


Since X, is ¥, C ¥A,-measurable for n > k, Xoo is .A,-measurable for all k; 
hence X% 18 3- measurable. LI 


6.6.4 Comments. Let Zi: (Q, Z) > (Q,;',.4;') be a random object (i = 1, 
2,...). In 6.6.2, if Z, = o(Z,,...,Z,), then Fo, = o(Z), Z2, ...) (see Prob- 
lem 1). Thus E(Y|Z,,...,Z,) — E(Y|Z), Zo, ...) ae. and in L!. In 6.6.3, if 
F, = 0(Z,,Zn+1,---), then Fy 1s the tail o-field of the Z, (see 6.2.6). 

We now show that uniform integrability of a submartingale implies a.e. and 
L' convergence, and also implies that a last element can be attached. 


6.6.5 Theorem. Let {Xn n, Fn, n = 1,2,...} be a uniformly integrable sub- 
martingale. Then sup, E(X,~) < oo, and X,„ converges to a limit X% a.e. 
and in L!. Furthermore, if Zz is the o-field generated by J™,.4%,, then 
{Xn n n =1,2,..., oo} is a submartingale. If {X n, Fn, n =1,2,...} 18 a 
uniformly integrable martingale, so is {X p, n n = 1,2,..., oo}. [If {Xn, Fn, 
n= 1,2,...,00} is a (sub- or super-) martingale, where > 1s the o-field 
generated by the F, Xə 1S said to be a last element.| 


Proor. By 6.5.3, sup, E(|X,|) < co, so by 6.4.3, Xn > X% ae. By 6.5.5, 
l 
X, —— Xo. 

Now if Ace, and k>n, then by 6.3.2(c), {,X,dP < f, X, dP. 
Let k — oo; the L'-convergence yields f, X, dP < f, Xœ dP. Thus [6.3.2(c) 
again] X, < E(X xl Fn) hence {X,,.4%,,n = 1,2,..., oo} is a submartingale. 
The last statement follows from the above argument with “<” replaced by 
“> O 
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Theorem 6.6.5 indicates that uniform integrability of a submartingale 1m- 
plies some rather strong conclusions. In fact, a uniformly integrable martingale 
must have a last element. 


6.6.6 Theorem. {X,,.4%,,n = 1,2,...} 1s a uniformly integrable martingale 
iff there is an integrable random variable Y such that X, = E(Y|*¥,), 
n=1,2,...; in this case, X, > E(Y |F) ae. and in L', where .¥,, is the 
o-field generated by J*~, Fy. 


Proor. The “if” part follows from 6.6.1 and 6.6.2. The “only if” part follows 
from 6.6.5 with Y =X. O 


In 6.6.6, if we require that Y be .~..-measurable, then Y is unique (up 
to sets of measure 0). For if X, = EO |F), then E(Y|%,) = E(X xl Fn), 
n=1,2,...,80 f, YdP = f, Xœ dP for all A € UP, Zn, and hence for all 
A € Fœ (monotone class theorem). Thus Y = X» ae. by 1.6.11. 

A sub- or supermartingale with a last element need not be uniformly inte- 
erable, but there are partial results in the direction. 


6.6.7 Theorem. Let {X,,4,,n = 1,2,..., 00} be a nonnegative submartin- 
gale with a last element. Then the X,„ are uniformly integrable. 


PROOF. 


J X,aP < J Xo aP, 
{Xn >c} {X >c) 


E(Xn) — E(Xoo) 
Cc c 


and 


P{X, > c} < 0 as c> œ 


uniformly in n. L 


We now give an example of a supermartingale {X,,,.%,,} with a last element 
that is not uniformly integrable. (Also, {—X,,, .%,,} will be a submartingale with 
a last element that is not uniformly integrable.) The key feature of this example 
is that lim,_.5. X, will be a last element when the sequence is regarded as a 
supermartingale, but not when it is regarded as a martingale. 


6.6.8 Example. Let Y,,Y2,... be independent, with P{Y;=1}= p, 
P{Y; =O} =1— p,0 < p < 1.LetX, = p” H= Yi, Fa =F Y1, ..., Yn). 
Then {Xn , sn, n = 1,2,...} is a martingale, and hence a supermartingale 
[see 6.3.3(b)]. Since all X, > 0, we have E(O|.¥%,) =0 < X,, so 0 is a last 
element when the sequence is regarded as a supermartingale, but not when 
it is regarded as a martingale. But the X„ are not uniformly integrable; 
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ror P{Y; = 1 for allj} = lim, p” = 0; hence (a.e.) X, = =0 eventually, SO 


X„ — 0 ae. If the X„ were uniformly integrable, then X, —~_, 0 by 6.5.5; 
rence E(X, ) > 0. But E(X,,) = 1 for all n, a contradiction. If we regard the 
sequence as a martingale, there can be no last element. For if X» 1s a last 
>lement, then X, = E(X ~| fn) for all n; hence by 6.6.1, the X,, are uniformly 
ntegrable. 

Since (a.e.) X, = 0 eventually, we have an example of a “fair” game in 
which the gambler is almost certain to be wiped out. Thus the term “locally 
fair” 1s perhaps more appropriate than “fair.” 

Note that a sub- or supermartingale with a last element converges a.e. [In 
the submartingale case, for example, sup, E(X, 7) < E(X.~~) < œ.] But the 
limit need not coincide with the last element. 

We now look at the problem of L? convergence. 


5.6.9 Theorem. Let {X,,.4,,n =1,2,...} be a martingale or a nonnega- 
ive submartingale with E[|X,|?] < M < œ for all n, where p > 1. Then X, 
sonverges to a limit X% a.e. and in L”. 


PROOF. By 6.5.6, the X, are uniformly integrable, so by 6.6.5, X„ converges 
a.e. to a limit X~, and X% 1s a last element. 

Now {|X,|?; Fn, n = 1,2,..., oO} is a nonnegative submartingale. [In the 
zase in which {X,,,.¥,} is a martingale, use 6.3.6(b) with g(x) = |x|?; in the 
nonnegative submartingale case, use 6.3.6(a) with g(x) =x”, x > 0; 
g(x) = 0, x < 0.] By 6.6.7, the |X,,|? are uniformly integrable, and by 6.5.4, 
Xa > X% in P. O 


Problems 
l. Let Z,: (Q, FZ) > (Q,',. Fy’), n = 1,2,..., be random objects. If .F,, 
= o(Z,,...,Z,),n =1,2,..., show that the o-field generated by 


Cae Fa is 0(Z1,Z2,...). 

2. Let {X,„, fn} be a nonnegative supermartingale, so that X, converges a.e. 

to an integrable random variable X». 

(a) Show that E(X,,) —> E(X œ) iff the X,, are uniformly integrable. (This 
holds for any sequence of nonnegative integrable random variables 
converging a.e. to an integrable limit; the supermartingale property 
is not involved. Note also that we have E[|X,, — Xal] — O since we 
have a.e. convergence and uniform integrability.) 

(b) Show that X» is a last element. 

(c) If ECX,) — 0, show that X, > 0 ae. 
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3. Let {X,,.4},n = 1, 2,...be a submartingale (respectively, a martingale), 
and let F> be the o-field generated by the .¥,,. If there is an .¥ -measurable, 
integrable random variable Y such that E(Y| 7n) > Xn a.e., [EY | Fn) 
= X,, ae. in the martingale case], then there is an oo-measurable, inte- 
srable random variable X,, such that E(X,.|.¥,) > X, a.e. [E (Xxl n) 
= X,, a.e. in the martingale case.] 


6.7 OPTIONAL SAMPLING THEOREMS 

Let {X n, Fn, n =1,2,...} be a martingale, with X,, interpreted as a gam- 
bler’s total capital after n plays of a game of chance. Suppose that after each 
trial, the gambler decides either to quit or to keep playing. If 7 is the time of 
quitting, what can be said about the final capital X7? 

First of all, the random variable T must have the property that if we observe 
Xi,..., Xn, We can come to a definite decision as to whether or not T =n. 
A nonnegative random variable of this type is called a stopping time. 


6.7.1 Definition. Let {¥,,n =0,1,...} be an increasing sequence of sub 
o-fields of Z. A stopping time for the .¥, is a map T: Q — {0,1,..., co} 
such that {T <n}e.%, for each nonnegative integer n. Since {T =n} 
= {T <n}—{T <n — 1} and {T <n} =U,_){T = k}, the definition is equi- 
valent to the requirement that {T=n}e.¥, for all n=0,1,.... If 
{Xn n =0, 1, ...} 18 a sequence of random variables, a stopping time for {X,,} 
is, by definition, a stopping time relative to the o-fields .%,, = o(Xo,..., Xn). 
(The above definitions are modified in the obvious way if the index n starts 
from 1 rather than 0). 

If S and T are stopping times, so are $ V T =max(S,7T) and SAT 
=mn(S,T) (QdSVT <ns={S <n} O{T <n} {SAT <n} ={S<n}U 
{T < n}. Also, if T =n then T is a stopping time. 

By far the most important example of a stopping time is the hitting time 
of a set. If {X,,} is a sequence of random variables and B € (R), let T(w) 
= min{n: X, (œ) € B} if X, (œ) € B for some n; T(w) = œ if X, (œ) is never 
in B. T is a stopping time since {T < n} =|), ., {Xx E€ B} € Xr, k < n). 

If T is a stopping time for {X,,}, an event A is said to be “prior to T” iff, 
whenever T = n, we can tell by examination of the Xz, k < n, whether or not 
A has occurred. The formal definition 1s as follows. 


6.7.2 Definition. Let T be a stopping time for the o-fields .¥,,n =0,1,..., 
and let A belong to ¥. The set A is said to be prior to T iff AN {T <n} 
E€ Fa for all n = 0, 1,...[Equivalently, as in 6.7.1, AN {T = n} €.¥, for all 
n =0,1,....] The collection of all sets prior to T will be denoted by 7; it 
follows quickly that r is a o-field. Also, if T = n then .#7 is simply sn. 
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If S and T are stopping times and S < T, then Fs C Fr. For if A € Fs 


then 
k 


AN{T <k}=|JIAN{S =H N{T < k}. 


i=] 


But ANO {S =i} E A C.F, and {T < k} € F; hence A € F7. 

If the stopping time T is constant at n, then Xy is .7~7-measurable. We 
would like this idea to carry over to a general stopping time. Formally, let 
T be a finite stopping time for the o-fields ¥,,, and define Xy in the natural 
way; if T(@) = n, let X7(w) = X, (œw). If B € ACR), then {Xr € B} € F7, in 
other words, Xr is .~;-measurable. (Since r C.F by definition, if follows 
in particular that Xy 1s a random variable.) To see this, write 


(Xr € B} A {T <n} = | JHX: € B} N {T = k}]. 
k=0 


Since {X; € B} O{T =k} €.¥; for k < n, we have 
XrEeBOAOAMT <nl\eF,. 


Also, as T is finite, we have |J? o{T < n} = Q, so that {Xr € B} € F, as 
desired. 

If T is not necessarily finite, the same argument shows that [;7<.)X7 1S 
F r-measurable. 

Now in the gambling situation described at the beginning of the section, a 
basic quantity of interest is E(X7), the average accumulation at the quitting 
time. For example, if E(X7) turns out to be the same as E(X,)[= E(X,,) for 
all n by the martingale property], the gambler’s strategy does not offer any 
improvement over the procedure of stopping at a fixed time. Now in comparing 
X, and Xr we are considering two stopping times $ and T (S = 1) with 
S < T, and looking at Xş versus Xr. More generally, if Tı < Tə <--- form 
an increasing sequence of finite stopping times, we may examine the sequence 
X7,,X7,,-.-- If the sequence forms a martingale, then E(X7,) = E(X7,) for 
all n, and if Tı = 1, then E(Xr,) = E(X)). 

Thus if we sample the gambler’s fortune at random times T1, 7>,..., the 
basic question is whether the martingale (or submartingale) property is pre- 
served. This will always be the case when the sequence {X,,} is finite. 


6.7.3 Theorem. Let {X,,.4,,n =1,...,m} be a submartingale, and let T}, 
T,... be an increasing sequence of stopping times for the .¥,. [In other 
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words, the T,, take values in {1,...,m}, and {7, < k} E€ Fyr, k =1,...,m.] 
The o-fields 7, are defined as before: 


Fr ={AC FANT <kK}EF,k=1,..., mh. 


Then the Xr, form a submartingale relative to the o-fields “7,, a martingale 
if {X,,} 1s a martingale. 


Proor. We follow Breiman (1968). Define Y, = X7,, and note that each Y, 
is integrable: 


f Xr1dP=5 | Xap < BURN) < o0. 
: at t= 


i) i-| 


As the T, increase with n, so do the .#7, (see the discussion after 6.7.2). 

Now if A € .¥7,, we must show that Í a Ynti dP > Í a Yn dP (with equality 
in the martingale case). Since A = | J jlA N {T, = j}, it suffices to replace A 
by D; = AN {T, = j}, which belongs to .¥; now if k > j, we note that T, = j 
implies T,,.; > J, so that 


k 
J "dP = >> | YnsidP + | Y,.14P. 
Dj jay VDOT) DMT ni >k} 


k 


Thus 


J Y„+1 dP = J XidP + | X; dP 
D; i= j D;O{Tn4 =i} D,™“T p14 >k] 


J Xx — Yp 11) dP. (1) 
Dj 1™1Tn41>k} 


Now combine the i = k term in (1) with the Í X, dP term to obtain 


J XıdP+ | XıdP = | X, dP 
D,N{Ta}1=k} D,N{T,+1>k] D,N{Tp 412k} 


> J X,—1 dP 
Dj™"Fn4 1k] 


since {T+ > k} = {Tn <k — 1} € zı and D; € Fj C ¥;,_,. But 


J X,-, dP =J X KI dP, 
D;N{Tr =k} D, "Tai >k—-1] 


so this term may be combined with the i = k — 1 term of (1) to obtain 


[xoa 
DjO7T 4a >k—2} 
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Proceeding inductively, we find 


J YP > J X; dP — J Xr- Yn DdP. (2) 
D; D;OTa 2} D,NO{Tr >k} 


Now {T +1 > k} is empty for k > m. Finally, D; O {Ta41 = J} = D; since 
D; CiT, = j}, and X; = Y, on D;. Thus 


f| Ymar | Y, dP 
D, D; 


i 


as desired. In the martingale case, all inequalities in the proof become equal- 
ities. LJ 

Theorem 6.7.3 extends immediately to the case of an infinite sequence if 
each T, is bounded, that is, for each n there is a positive constant K,, such that 
Ta, < K,, ae. The same proof may be used; the key point is that {T +1 > k} 
is still empty for sufficiently large k. 

When {X,,} 1s an infinite sequence, the martingale or submartingale property 
is not preserved in general, but the following result gives useful sufficient 
conditions. 


6.7.4 Optional Sampling Theorem. Let {X,, X2 -- -} be a submartingale, and 
let T1, T2, ... be an increasing sequence of finite stopping times for {X,,}, with 
Y, = Xr, „n =1,2,.... If 

(A) EY, l) < œ for all n, and 

(B) liminfy . Sirsa |X| dP = 0 for all n, 


then {Y,,} is a submartingale relative to the o-fields “7, . If {X,,} is a martingale, 
so is {Y,}. 


Proor. Since integrability of the Y,, is now hypothesis (A), we can follow 
the proof of 6.7.3 to (2). The first integral on the right-hand side is f D; Y„ dP 
as before, but in the second integral, we no longer have {T +1 > k} empty 
for large k. But by hypothesis (B), Jo, AIT, 1>) X,dP — 0 as k — œ through 
an appropriate subsequence, and f. OIT, 1>k) Yn+1 dP — 0 as k > œœ since 
{T,,11 > k} decreases to the empty set. Thus 


| tma | Y, dP 
D; D, 


as desired. As before, all inequalities become equalities in the martingale 
case. L 
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If {X, } is a submartingale with a last element X~, we can define the random 
variable Xr for any stopping time T. On the set {T = oo}, Xr = Xp. In this 
case, the optional sampling theorem holds. 


6.7.5 Theorem. If {T,} is an increasing sequence of stopping times (not 
necessarily finite) for a submartingale {X,} having a last element X~, then 
{X7,} is a submartingale relative to the o-fields .¥7, ; 1f {X,,} is a martingale, so 
is {X7, }. In particular, this holds if the X,, are uniformly integrable (see 6.6.5). 


Proor. Case i: X, <0 for all n, and X,, =O. For any fixed n, let S} 
= T, Ak =min(T,,k), k = 1,2,...3 it is easily checked that $% is a stopping 
time for the X„. Now, if T, is finite, Xs, > Y, = Xr, as k — oo; hence by 
Fatou’s lemma, 
lim sup | Xs, dP < J Y, dP <0. 
Q Q 


k> 


The same conclusion holds for arbitrary T,,, because on the set {T, = oo} we 
have Xs, = Xx < 0 = X7,. But by 6.7.3, {Xs,} is a submartingale; hence 


[ xsap > | xs,aP = | x, ar. 
$ Q Q2 


which is finite. Therefore Y, is integrable. 
Again by 6.7.3, {XXT ak: fT ake n =1,2,...} is a submartingale (k fixed). 


Thus 
| Xr < | XrsinedP 
A A 


if A €.¥7 ,,. But if A € ¥7, then AN {T,, < k} E€ FT, ax, for 


AN{T, <i} for i<k, 


ANT, SKN, Ak i= lAo <k} for i>k- 


Thus 
J Xr,rcdP < | XT, ak dP, Å E FT, . 
AMT, <k) AMT, <k) 


But on {T, < k}, Ta A k = Tp; also, {Thai < k} C {T < k} and Xr, ak < 0; 


hence 
J XT, dP < J XT, dP. 
AMT, <k} AMT 1 €K} 
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Let k > œ to obtain 


J XT, dP < J XT dP 
ANT, <00} AMT, 41<00} 


As Xr, =0 on {T„, = 00} and X7,,, = 0 on {7,41 = oo}, we get 


J Xr dP < J Xr, dP, 
A A 


which is the desired submartingale property. 


Case 2: X, SE(X l| fn) nm=1,2,.... In ths case, Xp px 
= E (X| fT ak) = EX oo|¥r, ak) by 6.7.3 and 5.5.10(a). Therefore, the X7, ^k, 
k = 1,2,...are uniformly integrable by 6.6.1. 

Now if B € Fr, then BN {T,, < k} E€ FT, ak (see case 1). Thus 


J XrnudP = | X oo dP. 
BO{T, <k} BUT, <k} 


Let k — œ and use the uniform integrability of the Xr, ak to obtain 


J Xr, dP = J Xoo dP. 
BO{T, <00} BAIT, <00} 


But on {T, = 00} we have Xr, = X %, So 


J xr, dP = [ Xa dP for every B E FT.. 
B B 


Therefore Xr, = E(X|.¥7,), so that X7,, Xr., ...is a martingale. 

General Case: Write X, = X,' + Xn”, where X,’ = X, — E(X lfr) Xn” 
= E(X ~| F, ). The X,’ fall into case 1 and the X,,” into case 2, and the result 
follows. Note that if {X,,} is a martingale with last element Xx, we must have 
X, = E(X.|¥,) so that {Y,} is a martingale by the analysis of case 2, [C 


To conclude this section we give an example of a situation in which the 
optional sampling theorem does not apply. Consider the problem of fair coin 
tossing, that is, let Y1, Y2, ... be independent random variables, each taking 
on values +1 with equal probability. If X, = Yı +--+ Y,, the X, form a 
martingale by 6.3.3(a). Now with probability 1, X, = 1 for some n. (This is 
a standard random walk result; for a proof, see Ash, 1970, p. 185.) If T is the 
time that 1 is reached (the hitting time for {1}), and $ = 1, then S and T are 
(a.e.) finite stopping times, but {Xs, Xr} is not a martingale. For if this were 
the case, we would have E(X7) = E(Xs) = E(X,) = 0. But X7 = 1; hence 
E(Xr) = 1, a contradiction. In addition, we obtain from 6.7.5 the result that 
the X,, are not uniformly integrable. 
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Problems 


l. Let {X,,...,X,} be a submartingale, and let T be a stopping time for 
{X;,1 <i<n}. Show that 


E(\X7|) < 2E(X,~) — E(X)). 


The corresponding result for supermartingales, which may be obtained by 
replacing X; by —X;, 1s 


E(\|Xr|) < 2E(X, ) + EX). 


2, Let {X,, X2,...} be a submartingale, and T a finite stopping time for {X,, }. 
Show that 
E(\Xr|) < 2sup E(X,,") — E(X)). 


As in Problem 1, the analogous result for supermartingales 1s 
E(|X7|) < 2supE(X, )+ EX) 


3. (Sub- and supermartingale inequalities) (a) Let {X,,...,X,} be a sub- 
martingale. If A > 0, show that 


l<i<n 


APÍ max X; > 2} < J X,dP < E(X,*). 
f max x.2a} 


l<r<n 


(b) Let {X,,...,X,} be a submartingale. If A > 0, show that 
AP$ max X; > a} < E(X;) -j X,dP 
l<i<n f max xc} 
Zian 


< E(X,;) HEX, ) 


(Apply 6.7.3, with T = min{i: X; > A}; T =n if all X; < 2.) 
(c) If{Xi,X2,...} 1s a submartingale and A > 0, show that 


AP{supX, > A} < sup E(X,” ); 


if {X,,X>,...} is a supermartingale and A > 0, show that 


AP{supX, > A} < E(X,)+ sup E(X, ). 


4. Use Problem 3 to give an alternative proof of Kolmogorov’s inequal- 
ity 6.1.4. 
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5. 


(Wald’s theorem on the sum of a random number of random variables) Let 

Yi, Y,...be independent, identically distributed random variables with 

finite mean m, and let X, = $`% Yx. If T is a finite stopping time for 

{Xna}, establish the following: 

(a) If all Y; > 0, then E(X7) = mE(T). 

(b) If E(T) < oœ, then E(|X7]) < 00 and E(X7) = mE(T). 

[Let T, =T An and apply 6.7.3 to {X, — nm} to prove (a); use (a) to 

prove (b).] 

(c) IfT isa positive integer-valued random variables that is independent 
of (Y1, Y2,...), but not necessarily a stopping time, show that the 
results (a) and (b) still hold. 


[Alternative proof of the upcrossing theorem (Meyer, 1966)| Let 
Xk, Fe, k = l.. ʻo n} 


be a nonnegative submartingale, and U the number of upcrossings of 
(0, b) by X1,...,X,. Define the stopping times T; as in 6.4.2, and for 
convenience set Xy = X,. 

(a) Show that X, = )°;_) (Xr, — Xr, ) (take Xr, = 9). | 

(b) Show that E(X,) > bE(U); the general upcrossing theorem is then 


obtained just as in 6.4.2. | Since E(Xr, — Xr,_,) = 0 for all k, and 


Xr, — Xr, = b if k is even and T; < &, 


E(Xn) > X, EXT, — XT) 2 bE). 


k=l 
k even 


6.8 APPLICATIONS OF MARTINGALE THEORY 

Martingale ideas provide fresh insights and simplifications for many prob- 
lems in probability; in this section we consider some important examples. 
First, we use the martingale convergence theorem to provide a short proof of 
the strong law of large numbers for iid random variables (see 6.2.5). We need 
two preliminary facts. 


6.8.1 Lemma. If X,,...,X,, are independent, identically distributed random 
variables with finite expectation, and S, = `}; Xx, then 


Sn 
E(X,|S,) = — a.€., k=l1,...,n. 
n 
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Intuitively, given S$, = X; +--+ Xn, the average contribution of each X% is 
the same, and hence must be S, /n. 


Proor. If B €.#(R), then 


J X,aP = E[Xklis eR] 
{S EB} 


OO OO 
=j | xp (x +--+: + xu) dF): +-dF (Xn), 
— 00 -00 


where F is the distribution function of the X;. By Fubini’s theorem, this is 
independent of k; hence 


l á S, 
J X, dP = _ | SoxaP = | —" dP. O 
{Sn €B) n J{S EB} z] {S eB} M 


6.8.2 Lemma. If Xı, X2,... are random variables and S, = $`}; Xx, then 
o(Sn, Snl, Sn+2, t ) — O(Sn, X p+] ' X n42, e .). 


PROOF. Since X , +: == Snik — Sntk-1;, 9n, X n+l, n42, ...are each O(Sn, Sntls 
Sn+2,-..)-measurable; therefore o(S,,Xn+1,Xn+2,.--) C OSa, Spat, Sn42, 
...). Similarly, Sn, 8,41, Sn+2,...are each o(S,, X,+), Xn+2, ...)-measurable; 
hence o(S,,, Sni1,Sn42,.+-) C Oln, Xn41, Xn42,...) L 


6.8.3 Strong Law of Large Numbers, tid Case. If X,, Xo, ... are iid random 
variables with finite expectation m, and S, = Xi +--+ Xn, then S,/n —> m 
a.e. and in L!. 


Proof. (X,,...,X,) and (X,11,Xn+2,...) are independent [Problem 1(a), 
Section 4.11]; hence (X1, S,,) and (Xn+1, Xn+2,...), aS functions of indepen- 
dent random objects, are independent by 4.8.2(d). Therefore 


E(X1|S,) = E(X1|8n, Xna1, Xn42,.--) by Problem 2, Section 5.5 
= E(X,|8y, Sn41, 5n42; o.) by 6.8.2, 


Thus by 6.8.1, 
Sn 
E(XilSn, Snil) = — a.c, 
n 


But by 6.6.3 and 6.6.4, E(X1|Sn, Snzis-.-) > E(X |Z) ae. and in L', where 
Fa is the tail o-field of the S,. Thus S,/n converges a.e. and in L' to a finite 
limit. 
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Now to show that the limit is in fact m, we may proceed in two ways. 
One approach is to note that lim,_,..(S,/n) is a tail function of the X,, and 
hence is a.e. constant by the Kolmogorov zero—one law 6.2.7. Since S,/n is 
L'-convergent and E(S,,/n) = m, the constant must be m. 

Alternatively, we may use the Hewitt— Savage zero—one law 6.2.9 to show 
that each set in > has probability O or 1. For if A € Zx and T permutes n 
coordinates, then A € o(S,, S,+1,...); hence A is of the form {(S,,, 8,41, .-.) 
E€ A’} for some A’ E€[.4@(R)]~. Since Sg =X, +- +X = Xros 
+Xræ = Stuy, k = n,A is symmetric, and, therefore, the Hewitt- Savage 
zero—one law is applicable. Thus %,, is trivial, and it follows that E(X1|%oo) 
= E(X,) a.e. by 5.5.7(a). L 

Notice that we have obtained L' convergence in the strong law of large num- 
bers; this would be more cumbersome to derive using the classical approach 
of Section 6.2. 

We now consider the general problem of convergence of series of indepen- 
dent random variables. The following variation of the martingale convergence 
theorem will be proved first. The technique of the proof rather than the result 
itself will be used in the development, but the theorem does have applications 
to series of dependent random variables. (For example, see Problem 5.) 


6.8.4 Theorem. Let {X n, Zn n =1,2,...} be a submartingale, and let 
Z =sup,(X, — Xn-1) (define Xo=0). If E(Z) < co (for example, if 
Xn — Xņpn-ı is less than a constant for all n > 2), then X, converges to a finite 
limit a.e. on the set {sup, X, < oo}. If {X,,.%,} is a supermartingale and 
Elinf, (X, —X,—1)] > —oo, then X, converges to a finite limit a.e. on 
{inf, Xp > —oo}. 


Proof. Fix M > 0, and let T = inf{n: X, > M}, set T = oo if there is no 
such n. Define T, =T An; if Y, =X7,,n =1,2,..., then {Y,} is a sub- 
martingale by 6.7.4(a). (This is sometimes called the optional stopping theorem 
since Y, = X, ifn < T, Y, = X7 if n > T; thus {Y,} is the original process 
stopped at time T.) 

Now if n <T, then Y, =X, = X -1 + (Xn —Xn_-1) <M +Z, and if 
n> T, then Y, = Xr-ı + (Xr —X7_|) < M +Z. Thus Y, < M +Z in any 
case, so sup, E(Y,~) < M + E(Z*) < œ by hypothesis. By 6.4.3, Y„ con- 
verges a.e. to a finite limit. But if T = co, then Y, = X,; hence X, converges 
a.e. on {sup, Xn < M}. Since M is arbitrary, X, converges a.e. on {sup, Xn < 
oo}. The last statement of the theorem is proved by applying the above argu- 
ment to —X,. U 


We have seen in 6.2.1 that if Y1, Y2,...are independent random variables 
with 0 mean, and )07°, E(Y;”) < œo, then Y% , Y} converges a.e. There is 
a partial converse to this result, which we prove after one preliminary. 
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6.8.5 Theorem. If {X,, X>,...} is a martingale and E(X,*) < oo for all 
n, then the martingale differences X,,X7—X),...,X,—Xn_\,... are 
orthogonal. 


Proor. If j< k, and Z; = o(Xj,...,X;), 


E[(X; ~ Xj-1)(Xk — Xe~1)] = ELE CX; — Xi- XX: — Xk- DIN 
= E[(X; — Xj- DE (Xk — Xk- |F; 


since X; — X;_, 1s .¥;-measurable. But E(X; — Xxl) = X; — X; = 0 by 
the martingale property, and the result follows. D 


6.8.6 Theorem. Let Yı, Y2,... be independent random variables with 0 
mean, and assume E[sup; Y z] < 00. (For example, this holds if the Y, are 
uniformly bounded.) If 5°72, Y, converges a.e., then YX, E(¥;*) < œ. (As 
in Section 6.2, “convergence” of a series means covergence to a finite limit.) 


Proor. The X, = $`; Yx form a martingale. Choose M such that 
Pt sup |X,| < M} > 0; 


this is possible since the series converges a.e. 

Let T = inf{n: |X,| > M};T = œ if there is no such n. If T, =T An, then 
{X7,} is a martingale, and just as in 684, |X7|<M+ sup; 
X; — X;-1| =M +Z, where E(Z?) < œœ by hypothesis. It follows that the 
numbers E(X} ) are uniformly bounded, so by 6.6.10, Xy, coverages a.e. 
and in L’, But by 6.8.5, E(X} ) = Di- El(Xr, —Xr,_,)°] (take Xr, = 0). 
Since Xr, is L?-convergent, E(X 7) approaches a finite limit; hence 
S2 El(Xr, —X7,_,)2] < 00. 

But Xr, — Xr, = Yalpr>ny, 80 X4 En lir>n\) < 00, and consequently 


0O 
SEW Tran |¥ 1, .--, Yn—1) < 00 a.c. 


n=l 


[To see this, note that if Z, > 0 and }>, E(Z,) < oo, and E (>>, Zn) < 00, 
SO Son Zn < © ae; set Z, = E(Yn liT>nlY1, cee, Yh] 
Now [irony = liT<n-1}, Which is o(Y1, ..., Yn—1 )measurable; hence 


OO OO 
So UramE Wn Yai, «66s Yn) = > UpranjE(Wn*) < 00 ae. 


n=] n=l 
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Pick an œ where the series converges and where sup, |X,(@)| < M (see 
the beginning of the proof). The T(@) = oo >n for all n; it follows that 
So E, ) <œ. O 


We now have a partial solution to the general problems. 


6.8.7 Theorem. Let Yı, Y2,... be independent random variable with 0 
mean; assume E[sup, Yz*] < oo. Then $> X, Y; converges a.e. if and only 
if YZ E(Y x7) < o. 


Proof. Apply 6.2.1 and 6.8.6. O 


We can now complete the solution to the random signs problem (see the 
discussion after 6.2.1). If X, =a,Y,, where the Y, are independent with 
P{Y, = 1} = P{Y, = —1} = T and 5”, X, converges a.e., then a, — 0, so 
the X, are uniformly bounded. By 6.8.7, X` „ an” < 00. 

Incidentally, there is a martingale proof of 6.2.1. If X, = X`; Yz, where 


the Y; are independent with O mean and YX% | E(Yx7) < 00, then 


E(\Xn|) < EX, N 
by the Cauchy—Schwarz inequality 


n 1/2 
= Seo) 
k=l 


since the Y; are independent with 0 mean; hence orthogonal. Thus sup, E(|X,,) 
< oo; hence X, coverages a.e. 

The same argument also shows that if {X,} is a martingale and 
Son EXn —X,-1) < œœ (so that the X,, — X,— are orthogonal by 6.8.5, but 
not necessarily independent), then X, coverages a.e. 

We now drop the hypothesis of zero mean. 


6.8.8 Theorem. Let Y,, Y2, be independent, uniformly bounded random vari- 
ables. Then $`}; Y; converges a.e. iff X>% Var Y¥; < œo and Y% E(Y}) 
converges. 


Proor. “If”: By 6.8.7, $2 (Vx — E(Yx)) converges a.e.; since So", E(Yx) 
converges, we have >>, Y} convergent a.e. 

“Only if”: by symmetrization. Let Y1, Z|, Y2, Z2, ... be independent, where 
for each j, Z; has the same distribution as Y;. Then E(Y; —Z;) = E(¥;) 
— E(Z;) =0, E[(Y; — Z;¥] = Var(Y; — Z;) = Var Y; + Var Z; =2 Var Y}. 
Since )/,¥j; converges a.e, so does } >; Zj. [To see this, note that 
PU(Y),..., Yn) €B} = P{(Z,...,Zn) € B} for all n and all B € ZXR”); 
hence Py = Pz.] It follows that >| (Y; — Z;) converges a.e., so by 6.8.7, 
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D; Var Y; < oo. Again by 6.8.7, >) ,(Y¥; — E(Y;)) converges a.e.; hence 
>; E(Y;) converges. O 


Finally, we obtain a general criterion for convergence of series of 
independent random variables. 


6.8.9 Kolmogorov Three Series Theorem. Let Yı, Y2,... be independent 
random variables. If M > 0, define 


nes if |Y,|<M, 
i= Jo if |¥,j/>M. 


(a) If >), ¥Y; converges ae., then for any M <o, the three series 
S; PUY; Æ Yy’), L E(V)), OO; Var Y; all converge. 
(b) Iffor some M > 0, the three series converge, then >| , Y; converges a.e. 


Proor. (a) By hypothesis, Y; > 0 a.e., so eventually Y; = Y,’. Thus (a.e.) 
Y; Æ Y;' for only finitely many j, that is, P(lim sup ,{Y; # Y;"}) = 0. By the 
second Borel—Cantelli lemma, >), P{Y; # Y;"} < oo. The other two series 
converge by 6.8.8. 


(b) By 6.8.8, >), Y; converges a.e. Since 2j P{Y ; £ Y,;"} < œœ, we have, 
almost surely, Y; = Y;’ eventually; hence 57 Y; converges a.e. CI 


6.8.10 Branching Processes. As a final example we analyze in detail the 
branching process of 6.3.3(d). Recall that Xọ = 1, and if X, =k, then 
Xn+1 = S Y;, where Y,,...,¥, are independent and P{Y; = p} = P}, 
1=0,1,.... We assume that m = E(Y;) = 507°, Lp; > 0. 

This excludes the degenerate case pp = 1 (in this case X, = 0 for all n > 1). 
We also assume that po + pı < 1. Uf po + pı = 1, then X, < 1 for all n. If 
Po > 0, then X, is eventually 0 since P{X,, = 1 for all n} = lim, pı” = 0, 
and if po = 0, then X, = 1.) 

Case l:m < 1. In this case, almost surely, X, is O eventually; thus the 
family name is extinguished with probability 1. 

For E(X,4.1|Xp = k) = KE(Y,) = km; hence E(X,,.1|X,) = mX,,. It follows 
that E(X,,.,) = mE(X,,). Itm < 1, then 


OO OO 
(Sox. =) E(Xn) <@, 
n=l n=l 
so Xn — 0 a.e. But the X, are integer-valued, and thus with probability 1, X,, 
is ultimately 0. 


Case 2: m > 1. We show that with probability r, X, is eventually 0, and 
with probability 1 =f, Xn — œ, where r is the unique root in [0, 1) of the 
equation $ 52o Pjs = 8. 
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Figure 6.8.1. 


Let g(s) = 2 j0 pjs’, 0 <s < 1. Consider d[g(s) — s]/ds = g'(s)— 1. We 
have (0) ~1 = pi — 1 < 0, g'(1)— 1 =m — 1 > 0, and since pp + pı < 1, 
2’(s) — 1 is strictly increasing. Since g(s) — s = po when s = 0 and O when 
s=1,g(s)—-s strictly decreases to a minimum occurring somewhere in 
(0, 1), and then strictly increases to O at s = 1. If follows that g(s) = s for 
exactly one s € [0, 1), say at s = r (Fig. 6.8.1). 

First assume 0 < r < 1 (hence po > 0). By 6.3.3(e), {r**} is a nonnegative 
martingale, and hence converges a.e. Since X, 1s nonnegative integer-valued, 
this means that for almost every w,X,(@) becomes constant (the constant 
depending on œ) or X, (@) > œ. 

Now P{X,, eventually constant} =`?2o P{X, = k for sufficiently large n}. 
If k> 1 and P{X, =k eventually} > 0, then P{X,„ =x for all n > N}>0 
for some N. But by the Markov property, this probability must be 
P{Xy =k} lim; qg’, where q = P{Xn41 =k|X, =k}. Now q <1 since 
P{X 4) =0|X, = k} = po“ > 0. Thus lim j— oo q? = 0, a contradiction. 

Therefore X, —> X» a.e., where X,, = 0 or œ. Since {r*",n =0,1,...} is 
bounded, the dominated convergence theorem gives E(r*") > E(r*~) 
= IP{X,,. = 0} + OP{X,, = co} = P{X,, = 0}. But by the martingale prop- 
erty, E(r*") = E(r*°) = r. Thus with probability r, X„ is eventually 0, and 
with probability 1 — r, X, — oo. 

If r = 0, then po = 0; hence X,.,; > X, => 1, so that X, increases to a limit 
X. But since the X, are positive integer-valued, 


P{X < œ} = P{X, eventually constant} 
OO 

= 5 P{X,=k eventually} 
k=1 


= 0 by the same argument as in the case O<r<l. 


(In the current situation, g = P{X,4) = k\Xn = k} = P{Y =-= Yz = 1} 
= pi* < 1.) Thus X,, > © a.e., as desired. 
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Case 3: m= 1. Here, extinction occurs with probability 1. For by 6.3.3(d), 
{X,} is a nonnegative martingale, and hence converges a.e. to a finite limit. 
The analysis of case 2 shows that X, cannot approach a nonzero constant 
on a set of positive probability, and hence X, — 0 a.e.; in other words, with 
probability 1, X, 1s eventually 0. 


Problems 


l. Let Xi, X2,...be arbitrary random variables, with S, =X, +---+Xn. 
What is the relation between the tail o-field of the X, and the tail o-field 
of the S,? 

2. What happens in the branching process (see 6.8.10) if instead of Xo 
= 1, Xp is an arbitrary, positive integer-valued random variable? 

3. (Breiman’s realistic gambling model) Let X,, be a gambler’s capital after 
n plays of a game of chance. Assume that the gambler has the option of 
betting or passing at each trial. If he passes at trial n + 1, then X,.; = Xn, 
and if he bets, then we assume that |X,., — X,| > b > 0; thus there is a 
minimum amount b that can be won or lost on a given trial. We do not spell 
out the gambler’s strategy in detail; we simply assume that his strategy, 
together with the house rules, determine the distribution of (Xo, X1, ...). 
It is reasonable to assume that the game is unfavorable or at best fair; 
thus we take {X,,} as a nonnegative supermartingale. 

Let T be the time of the last bet, that is, largest n such that |X +1 — X,,| 
> b (T = œ if there is no such n). Note that T is not a stopping time. 

(a) Show that T is a.e. finite. 

(b) Show that E(Xr) < E(Xo), so no system can increase the expected 
winning. 

(c) If the gambler’s strategy is always to bet so long as his capital is at 
least b, show that (a.e.) X, will eventually be less than b, in other 
words, the persistent gambler goes broke with probability 1. 

(d) If an unbiased coin is tossed independently over and over again, 
and we win a dollar for each head and lose a dollar for each tail, 
our accumulated capital must eventually reach +1 (see the end of 
Section 6.7). Suppose our strategy is simply to wait until we reach 
1 and then quit, thus guaranteeing a profit. Why is this not realistic? 

Let Y1, Y2,...be integrable random variables, and %1, %3, ...an increas- 

ing sequence of sub o-fields of Z. Assume that Y; is .%,-measurable for 

each k, and define 


Xn = > [Yk -— EPA] 
k=] 
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[take .% = {@, Q}, so that E(Y,|.F¥,) = E(Y,)]. Show that {X p, Fn} is a 
martingale. 

5. (Lévy’s extension of the Borel—Cantelli lemma) Let {¥,,} be an increasing 
sequence of o-fields, and let A, € Fp, n =1,2,.... Define 


dj = P(A;j|.¥;-1), j — 1, 2, sees 


[take .%9 = {Ø, 22}, so that qı = P(A,)]. Show that (a.e.) infinitely many 
A; occur iff >| .q; = 00, that is, P[(imsup,;A;) A {>_,; 4; = œ} = 0. 
Equivalently, >) ,I4; and )/,q; have essentially the same convergence 
set. [Apply 6.8.4 to X, = Xoj- Ua; — 9). 


6.9 APPLICATIONS TO MARKOV CHAINS 

In this section we apply martingale theory to the problem of classifying 
the states of a Markov chain. We must use a few basic properties of Markov 
chains: the reader who is unfamiliar with this subject may consult Ash (1970, 
Chapter 7). In particular, a state į is said to be recurrent iff starting at i there 
will be a return to i with probability 1; otherwise the state is transient. If C 
is a set of states such that every state in C can be reached (in a finite number 
of steps) from every other state, then all states in C are of the same type, 
recurrent or transient 

We have the following criterion. 


6.9.1 Theorem. Let [p;;] be the transition matrix of a Markov chain such 
that every state in the state space $ is reachable from every other state 
(sometimes called an irreducible chain). Choose a fixed state, and label it 
0 for convenience. The states are transient iff there is a nonconstant bounded 
f: S — R such that >) es Pj JOG) = fO for all i #0. 


Proor. Suppose such an f exists. By adding a constant to f we may assume 
that f > 0. Assume the initial state is i Æ 0, and let {X,,} be the corresponding 
sequence of random variables. Let T be the time at which O is reached, and 
let Y, = Xran n =0,1...;{¥,} can be realized as a Markov chain with the 
same initial distribution and transition matrix as {X,}, except that O is now an 
absorbing state. In other words, the transition matrix for {Y,,} is 


Pi; = Dij for all j if i Æ 0, 
Poo = 1. 
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Thus > es fi fG) = f(a) for all i € S. In matrix form, [| f = f; by induction 


Il f = f, that is, 
YPP fi) = fH, 


jes 


where ° = P{Y, = j|\Yo = i}. But this says that E[ f(Y,)|¥o = i = f(). 

If the states of the original chain are recurrent, then O will be visited with 
probability 1; hence Y, — 0 ae. By the dominated convergence theorem, 
E[f(Y,|¥o =i] — f(O). We conclude that f(z) = f(O) for all i, contradicting 
the hypothesis that f is nonconstant. 

Conversely, if the states are transient, we define f: S — R as follows. If 
i 40, let f(i) = fio, the probability that, starting from i, O will eventually be 
reached; take f(0) = 1. Now in order ultimately to reach 0 from i Æ 0, we 
may either go directly to O at step 1, or go to a state j Æ O and then reach 0 
at some time after the first step. It follows that 


f= fG) iF. 
yes 
(This may be formalized using the Markov property.) 
Now f is clearly bounded, and fip < 1 for some i Æ 0, otherwise 0 would 
be a recurrent state. Thus f is nonconstant. [CI 


Martingale theory is used in deriving the following sufficient condition for 
recurrence. 


6.9.2 Theorem. Let [p,;| be the transition matrix of an irreducible Markov 
chain whose state space S is the set of nonnegative integers. If there is a 
function f: S — R such that f(i) > œ as i > ov, and $, pif) < FO 
for all i Æ 0, then the chain is recurrent. 


ProoF. As f(i) > œ as i > oo, f is bounded below, so without loss of 
generality we may assume f > 0, Let the initial state be i Æ 0, and form the 
process {Y,} as in 6.9.1. The >) .-5 Bj fO) < f@ for all i, which implies that 
{f(Y,,)} is a nonnegative supermartingale, and hence converges a.e. to a finite 
limit. For 


EL fn do = io, -< , Yn-1 = in-1] = ELF n) 
by the Markov property 


=X Pn fO) 
j 


Yn-1 — in—1] 


< fin-i ). 
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Note also that 


Elf (¥n)I¥o =i] = >> pi fG) < fa) < 00; 


JES 


hence { f(Y,,)} 1s integrable. 

Assume the states transient. Then fip < 1 for some i > 0. Choose such 
an i as the initial state. Now X, — œ a.e. since a finite set of transient 
states cannot be visited infinitely often (see Ash, 1970, p. 223). This mean 
that with probability 1, Y, —> 0 or oo. But P{Y, — 0} = fio < 1, and hence 
P{Y, — oo} > 0; this implies that P{ f(Y,) —> co} > 0, a contradiction. L 


The proof of 6.9.2 shows that if [p,;] is the transition matrix of a Markov 
chain {X,,}, f is a real-valued function on the state space, and 


Spi fG)<f@ forall i, 
J 


where the series is assumed to converge absolutely, then, with a fixed initial 
state i, the sequence { f (X,,)} is a supermartingale. Similarly, replacement of 
“<” by “=” in this equation yields a martingale, and replacement by “>” 
yields a submartingale. 

We now apply 6.9.1 and 6.9.2 to a queueing process. 


6.9.3 Example. Assume that customers are to be served at discrete times 
t = 0,1,..., and at most one customer can be served at a given time. Say there 
are X, customers before the completion of service at time ¢, and in the interval 
[t,¢+ 1), Y; new customers arrive, where P{Y, = k} = pg, k =0,1,.... The 
number of customers before completion of service at time ¢ + 1 1s 


Xin) = (X, —1)° +Y, 


The queueing process may be represented as a Markov chain whose state 
space is the set of nonnegative integers and whose transition matrix is 


O 1 2 3 


Po Pi P2 

Po Pi P2 +? 

0 po P P °**° 
0 O po Pi P 


a 
| 
WN O 


We assume that po > 0 and po + pı < 1, so that the chain is irreducible. We 
analyze the behavior of the chain using 6.9.1 and 6.9.2. 
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The equations } 7-9 Pi; fG) = fG), i > 0, are 


pofi —1)+ pif) + pef +1) t e+ pifi tn- 1+- = fd. 
(1) 
Let m = E(Y;) = $`}, kpg; we show that the states are transient if m > 1, 
recurrent if m < 1, 
First assume m > 1; if f(i) = r', then (1) becomes 


por’! + pir’ + port! + a + pyri} + e.. — r' 
or 
OO 
y pr" =r, 
k=0 


But this can be satisfied for some r € (0, 1) (see case 2 of 6.8.10). Thus {7'} 
is bounded and nonconstant, so by 6.9.1, the states are transient. 
Now assume m < 1, and let f(i) = i. Then if i > 0, 


COO 
X pif) = poi —1)+ pi + poGitl)t+--- 
j=0 


OO 
= `> kPk-i+1 


k=i-] 


COO 
= $ (k-i+ 1pm ti-l 
k=i-—l 


OO 
=>) kp t+i-1<1+i-1=i= fii). 
k=0 
By 6.9.2, the states are recurrent. 

If i is a recurrent state and ji; is the average length of time required to return 
to i when the initial state is i, then i is said to be recurrent null if u; = ox, 
recurrent positive if u; < oo. It can be shown that the states are recurrent null 
if m = 1, recurrent positive if m < 1 (see Karlin, 1966, pp. 74ff). 


Problems 


1. Consider the Markov chain in Section 6.3, Problem 2, and assume go < 1. 
If r > 1 in (b), show that X, — —oo a.e.; hence the states are transient. 
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CHAPTER 


7 


THE CENTRAL LIMIT THEOREM 


7.1 INTRODUCTION 

If Xi, X2,...are independent, identically distributed random variables with 
zero mean, and S, = X; +--+ Xn, the strong law of large numbers states 
that S,/n converges a.e. to 0. Thus given e > 0, |S, /n| will be less than € 
for large n; in other words, 5S, will eventually be small in comparison with n. 
The strong law of large numbers gives no information about the distribution 
of Sa; the purpose of this chapter is to develop results (called versions of the 
central limit theorem) concerning the approximate distribution of S, for large 
n. For example, if the X,, are iid with finite mean m and finite variance ož, 
then for large n, (S, — nm)/./n o has, approximately, the normal distribution 
with mean 0 and variance 1. 

There are two basic techniques that will be used. First is the theory of 
weak convergence. If 4, 41, H2, ... are finite measures on .#(R), weak con- 
vergence of u, to u means that Jr f dun, > fo f du for every bounded 
continuous f: R — R. If the corresponding (bounded) distribution functions 
are F, fi, fo,..., the equivalent condition is that F, (a, b] —> F(a, b] at all 
continuity points of F. (See 2.8 for a discussion of weak convergence; in par- 
ticular, recall that +00 and —oo are, by definition, continuity points of F.) 
We shall denote weak convergence by jz, —+ yor F, — + F, Also, if B 
is a Borel subset of R, the terms F(B) and u(B) will be synonomous. 

Now assume that yz, is the probability measure induced by a random vari- 
able X,,n = 0, 1, ... (with uo = H, Xo = X). If Hn —> u, we say that the 
sequence {X,} converges in distribution to X, and write X,,——> X. Since 
fe f dun = ELf Xn), it follows that X, ——> X iff ELf(X,)] > ELS) 
for all bounded continuous f: R > R. 
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This in turn implies that if X, L, X and g is a continuous function from 
R to R, then 2(X, ) £, g(X). In particular, if X,, $, X, then X, +c 
— X + c for each real number c. 

Notice that convergence in distribution is determined completely by the 
distribution functions, or equivalently by the induced probability measures, of 
the random variables. In particular, the random variables need not be defined 
on the same probability space. Note also that since the distribution function 
of a random variable always has the value 0 at —oo and the value 1 at +0, 


d 
we have X, —— X iff F,(x) — F(x) at all continuity points of F in R. 
Now by Theorem 2.8.1, Un — > u iff u,(A) > H(A) for each Borel set 


A whose boundary ðA has pz-measure 0. Thus X, —— X iff P{X, € A} 
— P{X € A} for all Borel sets A such that P{X € ðA} = 0. This result justifies 


the terminology “convergence in distribution,” for it says that if X, 5, X, 
then X,„ and X have approximately the same distribution for large n. Of course 
it might seem more reasonable to require that P{X„, € A} —> P{X € A} for all 
Borel sets A, but actually this is not so. For example, if X, 1s uniformly dis- 
tributed between O and 1/n, that is, X, has density f,(x) =n, 0 <x < 1/n, 
fn(x) = 0 elsewhere, then for large n, X, approximates a random variable X 
that is identically 0. But P{X, = 0} = 0 for all n, and P{X = 0} = 1. 

The second technique involves the use of characteristic functions, which we 
now define. 


7.1.1 Definition. Let p be a finite measure on (R). The characteristic 
function of jz is the mapping from R to C given by 


hu) = fe dul) ueR. 
R 


Thus A is the Fourier transform of jz. If F is a distribution function correspond- 
ing to 44, we shall also write h(u) = fe e" dF (x), and call A the characteristic 
function of F (or of X if X is a random variable with distribution function F). 

Characteristic functions are uniquely appropriate in the study of sums of 
independent random variables, because of the following result. 


7.1.2 Theorem. Let Xi, X2,..., Xn be independent random variables, and 
let S, = Xı +---+X,,. Then the characteristic function of S, is the product 
of the characteristic functions of the X;. 
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PROOF. 
Ele") =E (mae) 
= M; E (e™*i) by independence. O 


Theorem 7.1.2 allows us to compute the characteristic function of S,, know- 
ing only the distribution of the individual X;’s. In fact, once the characteristic 
function is known, the distribution function is determined. 


7.1.3 Inversion Formula. If h is the characteristic function of the bounded 
distribution function F, and F(a, b] = F(b) — F(a), then 


1 C p—iva _ .—iub 
F(a, b] = lim — J E I hudu 
œ 27 iu 


for all points a, b(a < b) at which F is continuous. If in addition, A 1s Lebesgue 
integrable on (—oo, co), then the function f given by 


fx) = i f e™ h(u) du 
27 Jo 


is a density for F, that is, f is nonnegative and F(x) = Pa f(t) dt for all 
x; furthermore, F’ = f everywhere. Thus in this case, f and A are “Fourier 
transform pairs”: 


h(u) = J i e™ f(x)dx, 


f(x) = + J i e~ ™ h(u) du. 
on J_ 


OO 


If we are trying to compute the distribution of a sum S, of independent 
random variables X;, then Theorem 7.1.2 will be useful only if we can recover 
a distribution function from its characteristic function. In fact this is the case. 


Proor. Intuitively, if h(u) = ine e" dF(x), then h(u) = fr e” F' (x) dx, so 


1 of , 
F’(x) = = | huje ™™ du. 
— 00 


b 0O b 
Fa n= | Fjae= f hu) p e= d| du 


Thus 
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and since 
b —iuUa —inb 
| e — e 
[eae im 
a iu 


this leads to the inversion formula. For a formal proof, let 


l c e tua _ e tub 
Ie = — ——— ~ h(u) du, a <b. 
2m Je Lu 
Then | vb 
jl C eTa _ gT% CO 
I, = —- ee J et dF(x) du. 
20 Jc iu -oo 
Now 
(e71 — e~ tub) . e tua — e tub b , 
e] = | ——______| = e "|" dti<b-a 
iu [u a 
and 


[ f e-a] du = 2c(b — a)[F (co) — F(—oo)] < œ. 


—c 
By Fubini’s theorem, the order of integration may be interchanged to obtain 


OO c —iva _ ,—iub OO 
L= | ena] arw f J (x) dF), 
270 Joo |J—c Lu 


—& 


l [ sin u(x — a) — sinu(x — b) 


J (x) = a 


e 7 


[Note that 


S 
| 
a., 


[ cos u(x — a) — cos u(x — b) 
-e IH 

since the integrand is an odd function.] Let v = u(x — a) and w = u(x — b) to 
obtain 


Jœ) = — Y w — — 
27 —c(x-a) Y 27t —c(x-b) W 


1 eA sin y 1 feb sinw 
J dy 


Now 


* sin y 
— dv > 1 as S> œ and r—> —o, 
Y 
y 
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and since the integral is continuous in r and s, it is bounded uniformly in r 
and s. Thus for some M < oo, |J,-(x)| < M for all c and x; furthermore, as 
c —> oo, /,.(x) — J(x), where 


0 if x<a or x>b, 
Jix)= <1 if a<x<b, 
5 if x=a or x=b 


By the dominated convergence theorem, 
OO 
lim I, = J J(x)dF(x) 
c> -oo 


= F(b-) — F(a) + 5[F (a) — F (a`) + F (b) — F (b7 )] 


_ F(b) + F(b7) 7 F(a)+ F(a) 


5 5 = F (b) — F(a) 


if F is continuous at a and b. This proves the formula. 
Now assume A integrable on (—oo, œo). Let 


1 fS _. 
f(x) = | e“hi(u) du, —00 <x < OO. 
2m J_ 


OO 


Since A is integrable, f is well-defined; furthermore, f is continuous by the 
dominated convergence theorem. Now by Fubini’s theorem, 


b jl oe b 
J fœ) dx = = h(u) | e™ ax du 
a 27 J- a 
l c b 
= lim = | h(u) | e "2 as du 
c> 27 Je a 


1 C oiua _ ,—iub 
— lim Ł J E T? hudu 


c>% 2T J-e 1u 
= F(b) — F(a) 


by the inversion formula if a and b are continuity points of F. Thus F(b) 
— F(a) = f f(x) d& at continuity points of F. 

But every point is a limit from above of continuity points since F is mono- 
tone and thus has only countably many discontinuities. Since the integral is a 
continuous function of its limits, it follows that F(b) — F (a) = f? f(x)dx for 
all a and b. 
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Since f is continuous, it follows that F is differentiable everywhere and its 
derivative is f. Since F is increasing, f is everywhere nonnegative. Thus f 
is a density for F. [L 


7.1.4 Corollary. Let P; and Pz be probability measures (or more generally, 
finite measures) on .#(R). If tr e™* dP (x) = Í R e* dP.(x) for all u € R, then 
P; = P3. 


Proor. By the inversion formula 7.1.3, h determines F at all continuity 
points. But as in the proof of 7.1.3, every point is a limit from above of 
continuity points, and it follows that h determines F everywhere. D 


Various procedures involving Fourier or Laplace transforms may be used in 
actually computing the distribution of a random variable from its characteristic 
function (see Ash, 1970, Chapter 5). 

Characteristic functions have the following basic properties. 


7.1.5 Theorem. Let h be the characteristic function of the bounded distri- 
bution function F. Then 


(a) |h(u)| < h(O) = F(co) — F(—oc) for all u; 

(b) h is continuous on R; 

(c) h(—u) = h(u), the complex conjugate of h(u); 

(d) h(u) is real-valued for all u iff F is symmetric; that is, fẹ dF(x) 
= {_, dF(x) for all Borel sets B, where —B = {—x: x € B}. 

(e) If fg |x|’ dF(x) < oo for some positive integer r, then the rth derivative 
of h exists and is continuous on R, and 


hP (u) = / (ix)’e"™ dF(x). 
R 


Proor. Part (a) is clear since |e’““| = 1, and (b) follows from the dominated 
convergence theorem (see Problem 1, 1.6). Part (c) follows from the fact that 
the conjugate of e™ is e7™™*. To prove (d), assume A to be real-valued and, 
for now, let F be the distribution function of the random variable X. Now 
E(e7™™*) = h(—u) = h(u) = h(u) as A(u) is real, so that —X has characteristic 
function h. By 7.1.4, X and —X have the same distribution; hence P{X e€ B} 
= P{X € —B}, and F is therefore symmetric. In general, we may multiply 
F by a positive constant c such that c(F (oo) — F(—oo)) = 1. [If F(co) 
— F(—oo) = 0, then fẹ dF(x) =0 for all B € .4(R) and the result is triv- 
ial.] The above argument shows that fp cdF(x) = {_,cdF(x), and hence F is 
symmetric. 

Conversely, assume F symmetric, and let g: R— R be a bounded 
Borel measurable function. If g is odd [g(—x) = —g(x) for all x], then 
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tr 2(x) dF (x) = 0; to prove this, approximate g by odd simple functions. In 
particular, fp sin ux dF (x) = 0, and A is real-valued. 

Finally, we prove (e). Since |(ix)’e’| = |x|’ and fg |x|" dF(x) < 00, we 
may differentiate h(u) = fo e'““dF (x) r times under the integral sign (see 1.6, 
Problem 3 and 2.4, Problem 8), and the result follows. C 


Now suppose that h is a given function from R to C, and we wish to 
determine whether or not A is the characteristic function of some bounded 
distribution function F. There are some practical approaches that often work. 
For example, if h fails to satisfy (a), (b), or (c) of 7.1.5, h cannot be a 
characteristic function. Also, suppose / is continuous and Lebesgue integrable 
on R, and we compute 


fix) = J e~" h(u) du, xER. 
27t R 


If f is Lebesgue integrable, then 


h(u) = [ e'™ F(x) dx, ueR. 
R 


(This is a standard Fourier transform result; see Rudin, 1966, p. 186.) Thus if 
f is everywhere nonnegative, then A is the characteristic function of a measure 
u with density f, that is, 


y(B) = J fde Be B(R). 


There is a general criterion for deciding whether or not h 1s a characteris- 
tic function. The result is of considerable importance in the development of 
second-order properties of stochastic processes. However, it is in general not 
useful when applied to explicit examples. 


7.1.6 Bochner’s Theorem. If h: R — C, then A is a characteristic function 
iff h is continuous at the origin and is nonnegative definite, in other words, 
for all w),...,u, E R,n = 1,2,..., and all complex numbers q),..., dy, 


` a;h(u; — Uy ay 


j k=l 


is real and nonnegative. 


7.1 INTRODCTION 297 


PARTIAL PROOF. Assume A is a characteristic function; then / is continuous 
everywhere by 7.1.5(b). To prove nonnegative definiteness, write 


osf Ya e™*| dF(x) = ST aj 4G e—* dF(x) 


j, k=l 


= `o aj;htu; — Uy Jaz. 


j,k=l 
The converse is considerably more difficult, and since the result will not be 


used in the text, the argument will be omitted. For a complete proof, see Loève 
(1955) or Ash and Gardner (1975). O 


We shall need the result that if X is normally distributed with mean m and 
variance o7, the characteristic function of X is 


h(u) = exp(ium) exp(— suo"), 


For the computation, see Ash, 1970, p. 163. 
Another basic result is the relation between convergence in distribution and 
convergence in probability. 


7.1.7 Theorem. (a) If X, converges to X in probability, then X,, converges 
to X in distribution. (b) A partial converse: If X, converges in distribution 
to a constant c, then X,, converges in probability to c. 


Proor. (a) Let F,„ be the distribution function of X,, and F the distribution 
function of X. Then 


F(x) = P(X, <x} = P(X, <x, X >xtel}+P{X, <x,X <xt+e} 

P{|X, — X| > ef} + PIX <x+e} 
P{|X, —X| > e}+F+ 6), and 

F(x —-é€)= P(X <x-—e} = P{[X <x-—6,X, > x} 
+ P(X <x— E£, Xn <x} 
< P{|X, — X| > £} + P{Xn < x} 
= P{|X, — X| > £} + Fn(x). Thus 

F(x — £) — P{|Xn — X| = £} < Fa) < PUlAn — X| = £} 
+F(x +£) 


Since X, LAN X, we have P{|X, — X| > £} — 0 as n > œ. If F is con- 
tinuous at x then F(x — £) and F(x +€£)— F(x) as e > 0. Thus F, (œ) is 
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boxed between two quantities that can be made arbitrarily close to F(x), so 
F,@)— FC). 


(b) P{|X, -cl > £} = P{X, >ct+e}+ P(X, <c—e} 
= 1—P{X, <c+e}4+P{X, <c—e} 


Now P{X,, <c+ £} < P{X, < c + £} so 


P{|X, —c| > e} <1—P{X, <c+5}+P{X, <c—e] 
= 1— F, (c+ £)+F,(c—€) 


But as long as x Æ c, F,,(x) converges to the distribution function of the con- 
stant c, that 1s, 


1 ifx>c 
Fn) > {o ifx<c 


Thus F, (c+ $)-> 1 and F,(c—é)—Q0O as n — oo and, therefore, 
P{|\X, — c| > e} ~ 0asn — œ. 0O 


Problems 


1. Give an example of a sequence of random variables X„ such that X, 
converges in distribution to X, but X,, does not converge in probability 
to X. 

2. The following application of Theorem 7.1.4 is useful in computations 
involving characteristic functions. Let f and g be nonnegative Borel mea- 
surable functions from R to R, and assume that for some fixed real f, 
fe f(x)e—™ dx < œ and fr 2(x)e™ dx < œ. If 


J fae e dx = J g(xje e dx for all u € R, 


show that f = g a.e. (Lebesgue measure). 


3. If h and h are characteristic functions, show that A; + hz and Re h, are 
also characteristic functions. Is Im h; a characteristic function? 


4. Let h be the characteristic function of the random variable X. 

(a) If |A(u)}| = 1 for some u Æ 0, show that X has a lattice distribution, 
that is, with probability 1, X belongs to the set {a + nk: n an integer} 
for appropriate a and k (=2zu7'). Conversely, if X has a lattice 
distribution, then |h(u)| = 1 for some u Æ 0. 

(b) If |A(u)| = 1 at two distinct points u and au, where œ is irrational, 
show that X is degenerate, that is, a.e. constant. 
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5. Let X be a random variable with E(|X|") < oo for some positive integer 
n. If h is the characteristic function of X, show that AM (0) = i E(X*), 
k =0,...,n, and 


n F(X* 
(a) Aw) = 3 < iu) + ou”), 
n-1 F(X* "E(X 
o) hha- S . ti wk} < See. 
k=0 n! 


6.(a) If X is a random variable with E(|X|") < oo for all r > 0, show that 
the characteristic function of X is given by 


E xX" 
nu =S> £ iw" 


n=0 


within the interval of convergence of the series. This is the moment- 
generating property of characteristic functions. 

(b) Give an example of a random variable X with E(|X|’) < œ for all 
r > 0, such that the series 


E sa ; 
S EC in) 
n=0 ! 
converges only at u = 0. 


7.(a) Let A be the characteristic function of the bounded distribution function 
F. Define 
E,h(u) = hlu +r), r real. 


Show that 


2 00 : 2 
(= — = h(0) = -| (== =) x dF(x). 
2r Loo rx 


[(E, — E-r)h(0) = h(r) — h(—r); (E, — E-r) h(0) means 
(Er — E_,)(A(r) — h(—r)) = A(2r) — 2h(0) + h(—2r).] 

(b) If h” exists and is finite at the origin, show that [°.. x*dF(x) < oo. 
(Use L’H6pital’s rule and Fatou’s lemma.) 

(c) If h?”(O) exists and is finite, show that f° x?"dF(x) < co 
(n = 1,2,...). 

[It is probably easier to use part (b) and an induction argument rather than 

to extend part (a).| 
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8. If X is a random variable, let N(s) = E(e~**), s complex; when s = —iu, 
we obtain the characteristic function of X. If N is analytic at the origin, 
show that E(X*) is finite for all k > 0, and 


OO s __1\k k 
N(s) = ` CER) g 


t 
k=0 k i 


within the circle of convergence of the series. In particular, N% (0) 
= (1 EX), 


7.2 THe FUNDAMENTAL WEAK COMPACTNESS THEOREM 

The basic connection between weak convergence and characteristic func- 
tions is essentially this. Let {F„} be a bounded sequence of distribution func- 
tions on R (“bounded” means that for some positive M, F,,(00) — F,,(—0oo) 
<M for all n). Let {h,} be the corresponding sequence of characteristic func- 
tions. If F is a bounded distribution function with characteristic function A, 
then weak convergence of F,, to F is equivalent to pointwise convergence of 
h, to h. In the course of developing this result, we must consider the following 
question. If {F,,} is a bounded sequence of distribution functions, when will 
there exist a weakly convergent subsequence? Now any bounded sequence 
of real numbers has a convergent subsequence, so one might conjecture that 
any bounded sequence of distribution functions has a weakly convergent sub- 
sequence. In fact this is not true, but the following result comes close, in a 
sense. 


7.2.1 Helly’s Theorem. Let F,, F>,...be distribution functions on R. As- 
sume that F,,(—oo) = 0 for all n, and F, (œo) < M < œ for all n. Then there 
is a distribution function F and a subsequence {F,,,} such that F,,(%) > F(x) 
for each x e R at which F is continuous. 


Proof. Let D = {x),x,...} be a countable dense subset of R. Since the 
sequence {F,,(x;)} is bounded, we can extract a subsequence {F|;} of {Fa} 
with F ;(x,) converging to a limit yı as J — oo, Since {F'1;(x2)} 1s bounded, 
there is a subsequence {F2;} of {F'),;} such that F2;(x2) approaches a lim- 
it y2. Continuing inductively, we find subsequences {F,,;} of {Fm—1,;} with 
Finj(X%m) > Yn, m= 1, 2, ... (of course all | y,,| are bounded by M). 

Define Fp: D — R by Fp(x;) = Yj, J= 1, 2, eero and let Fn; = Firg, 
k= 1, 2,... (the “diagonal sequence”). Then F,,,(x) —> Fp(), x € D. Since 
Fa, 1s one of the original F„,x < y implies F,, (x) < Fa, (y); hence Fp(x) 
< Fp(y). Define 

F(x) = inf{Fp(y): y € D, y > x}. 


By definition, F is increasing. To prove that F is right-continuous, let z, | x; 
then F(z„) approaches a limit b > F(x). If F(x) < b, let yo € D, yo > x, with 
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Fp(yo) < b. For large n we have x < Zn < yo, so F(z,) < Fp(yo) < b. Thus 
lim F(z, ) < b, a contradiction, and therefore F(z, ) — F(x). 

Now we show that F,,,(x) —> F(x) at continuity points of F in R. If 
x <y € D, we have lim sup, ao Fn,(x) < lim sup, ,., Fn, (Y) = Fo(y); take 
the inf over y € D, y > x to obtain lim sup, so Fn,@) < F(x). 

If x < y<x, y € D, we have F(x’) < Fo(y)=limg_,.00 Fn, (y) = lim infoo 
Fa, (y) < lim infz sæ Fn,@). Let x —>x to obtain FO) < liminfy.. 
Fa (x). Thus if F(x) = F(x), we have F,, (x) > Fœ). O 


It must be emphasized that Helly’s Theorem does not say that F,,,(0o) 
— F (oo). [Recall that F'(oo) is lim,_,.. F(x); see the discussion after 1.4.2.] If 
every F,, 1s the distribution function of a random variable, so that F„ (oo) = 1 
for all n, it is possible for Foo) to be strictly less than 1. (See the example 
that follows in 7.2.2.) 


7.2.2 Comments. If instead of assuming that F,,(—oco) = 0 and F,,(00) < M 
for all n, we assume that F,, (00) — F, ,(—œ) <M < oo for all n, and 7.2.1 
implies that there is a distribution function F and a subsequence {F„,} with 
F„ (a, b] — F(a, b] for all a,b e R at which F is continuous. [To see this, 
consider G, (x) = F,(x) — F,(—oo).| 

If F,, F>,... are distribution functions on R* [assumed monotone in each 
coordinate and 0 at (—oo, ..., —0o)], and F, (R) < M < œ for all n, there is 
a distribution function F and a subsequence {F,,, } converging to F at continuity 
points of F in R*. The proof is essentially the same as above. 

There is no difficulty in constructing a bounded sequence of distribution 
functions with no weakly convergent subsequence. For example, let F, be 
the distribution function of a random variable that is identically n(F,,(x) = 1, 
x > n; F,(x) =0,x < n). If Fa, converges weakly to F, then F(a, b] must be 
0 for all a, b € R, a < b; hence F(co) — F(—co) = 0. But F,, (co) — F,,(—oo) 
= |], a contradiction. 

A bounded sequence of distribution functions will always have a weakly 
convergent subsequence unless, as in the above example, too much mass 
escapes to infinity. Weak convergence requires F,,(00) — F(coo) > F(o) 
— F(—o) (see 2.8.3 and 2.8.4). We now introduce a condition that guarantees 
that mass does not escape. 


7.2.3 Definition. Let .“¢ = {u;,1 € I} be a family of finite measures on the 
Borel sets of a metric space (2. We say that .»% is tight iff for each £ > 0, 
there is a compact set K C Q such that 4;(Q — K) < e for all i. (If Q = R$, 
the compact set can be replaced by an interval.) We say that s#¥ is relatively 
compact iff each sequence in Æ has a subsequence converging weakly to a 


finite measure on (Q). 
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If {F;,i € I} is a family of distribution functions, tightness of {F;} means 
tightness of the associated family of measures, and similarly for relative com- 
pactness. 

We may now prove the basic result. 


7.2.4 Prokhoroy’s Theorem. Let Æ = {F;,i € 1} be a family of distribu- 
tion functions on R, and assume that F;(00) — F;(—0o) < M < oo for all i. 
Then the family is tight iff it is relatively compact. 


Proor. Assume Æ is tight, and let F), F2,... be a sequence from A. 
By 7.2.1 and 7.2.2, there is a distribution function F and a subsequence {F,,, } 
such that F„, (a, b] — F(a, b] for all a, b € R at which F is continuous. Given 
€ > 0, let a and b be finite continuity points of F such that F,,(IR — (a, b]) < € 
for all n, and F(R — (a, b]) < £. If x € R and x is a continuity point of F, then 


Fa (00) — F(x) = Fa (œ) — Fn (b) + Fn (b) — Fr (x). 


But F,,(b) — Fi,@) > F(b)— F(x), Fa(œ)— Fa(b) < € for all n, and 
F(co) — F(b) < e. It follows that for sufficiently large k, Fa, (c0) — Fa, () 
differs from F(oo)— F(x) by less than 2e. Therefore F,,,(00) — Fn, (@) 
— F(oo) — F(x), and similarly F„, (x) — Fa, (~œ) > F(x) — F(—0o). Thus 


Fn, ——> F, proving relative compactness. 

Now if 2# is relatively compact but not tight, then for some £ > 0 we have, 
for each positive integer n, an F, € Æ with F,(R — (—n,n)) > e. If {F,,} 
is a subsequence converging weakly to F, then since R — (—n, n) is closed, 


lim sup Fap (R — (~n, n)) < F(R — (~n, n)). 
k 


Thus F(R — (—n,n)) > e for all n, and if we let n — œo, we obtain 0 > e, 
a contradiction. CO 


Prokhorov’s theorem yields the following result for sequences of random 
variables. If for each n = 1,2,...,X,, is a random variable with distribution 
function F„, and the sequence {F,,} is tight, then there is a subsequence {F,,, } 
and a random variable X (possibly defined on a different probability space) 
with distribution function F, such that F,,, converges weakly to F. 

Prokhorov’s theorem holds equally well for distribution functions on R* 
(with F;(IR*) < M < œ for all i); the proof is essentially the same as above. 


7.2.) Corollary. Let {F,,} be a bounded sequence of distribution functions on 
R. If {F„} is tight and every weakly convergent subsequence of {F„} converges 


to the distribution function F, then F,, + F. 
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Proor. If F, does not converge weakly to F, then fe f(x) dF, (x) +> fp f(x) 
dF(x) for some bounded continuous f. Hence there 1s an £ > O such that 


[ f(x) dF (x) — [ f(x) dF(x) 


for infinitely many n, say for n € T. By 7.2.4, {Fa, n € T} has a subsequence 
{F,,} converging weakly to a distribution function G, and G = F by hypoth- 
esis. But then fpe f(x) dF,(x) > J, f(x) dF(x), a contradiction. O 


>E 


7.2.6 Corollary. Let {F„} be a bounded sequence of distribution functions 
on R. If {F,,} is tight, then {F,,} converges weakly iff fp e’ dF,(x) approaches 
a finite limit as n — œ for each u € R. 


PROOF, Assume fr e"* dF (x) has a finite limit for all u. By 7.2.4, there is a 
subsequence {F’,, } converging weakly to a distribution function F. If F„ does not 
converge weakly to F, by 7.2.5, there is a subsequence {F,,, } converging weakly 
to a distribution function G # F. We know by hypothesis that fp e'” dFa (x) 
and fr e'™ dF,, (x) have the same limit as k —> oo. Therefore Jr e" dF(x) = 


Jr e!™ dGŒx) for all u € R. By 7.1.4, F = G, a contradiction; thus F,, — >F., 
The converse follows from the definition of weak convergence. (In this proof, as 
in 7.2.5, distribution functions that differ by a constant have been identified.) C 


One more result is needed before we can relate weak convergence to con- 
vergence of characteristic functions. 


7.2.7 Truncation Inequality. Let F be a bounded distribution function on 
IR, with characteristic function A. If u > 0, then for some constant k > 0, 


J dF(x) < k f no — Re h(wv)] dv. 
x> 1/u H Jo 


PROOF. 


= f MO) Re nwyid == | f G- cosux) dF) d 
Uu Jo U Jo J- 


=| f f 0- cos ux) a dF (x) 
-oo |u Jo 


by Fubini’s theorem 


=f (1- =a dF(x) 
EPON ux 


sin ź 
> inf (1 — mt) / dF(x) 
e>] É jux|>1 


1 
= F dF(x). LI 
k |x|>1/4 
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In fact, 


so we may take k = 7. 


7.2.8 Lévy’s Theorem. Let {F,,} be a bounded sequence of distribution func- 
tions on R, and let {h,} be the corresponding sequence of characteristic 


functions. If F, >» F , where F is a distribution function with characteristic 
function h, then h,(u) — h(u) for all u. Conversely, if h, converges point- 
wise to a complex-valued function A, where h is continuous at u = O, then 
h is the characteristic function of some bounded distribution function F, and 
F, — F. 


Proor, The first assertion follows from the definition of weak convergence, 
so assume h, (u) — h(u) for all u, with A continuous at the origin. We claim 
that {F,,} is tight. Using 7.2.7, 


J dF, (x) < k f mo — Re h wvl dv, u > 0 
x> 1/4 Uu Jo 


—> = f ino) — Re A(v)| dv as n —> OO, 
0 


by the dominated convergence theorem. 


Since A is continuous at 0, 
k li 
— | [A(0) — Re h(w)| dv > 0 as u — 0; 
H Jo 
hence, given £ > 0, we may choose u so small that 
J dF (x) < € for all n, 
|x|=1/u 


proving tightness. By 7.2.6, F, converges weakly to a distribution function F; 
hence h, converges pointwise to the characteristic function of F. But we know 
that h, — A, so that A is the characteristic function of F. O 


The following variation of Lévy’s theorem is often useful. 
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7.2.9 Theorem. Let {F,,} be a bounded sequence of distribution functions, 
and {h,} the corresponding sequence of characteristic functions. If F is a 


bounded distribution function with characteristic function h, then F, —» F 
iff h (u) — h(w) for all u, and in this case, h, converges to h uniformly on 
bounded intervals. 


Proor. If F, — o F , then h, (u) —> h(u) for all u by definition of weak 

convergence. If h (u) — h(u) for all u, then by 7.2.8, F, converges weakly 

to the distribution function whose characteristic function is h, namely, F. 
Now let / be a bounded interval of R. Then 


h,(u+ 6) — h, (u) = [oe — e”) dF,(x); 
R 


hence 
|n (u + ô) o h, (u)| < [ie _ 1| dF,(x) + 2F,(R a I). 
I 


Since F, > F , {Fn} is relatively compact, and therefore tight by 7.2.4. 
Thus if £ > 0 is given, we may choose / so that 2F,,(R — I) < £/2 for all n. 
If u € R, and M is a bound on {F (R), n = 1, 2,...}, then 


. E 
lan (u +5) —h, (u)| < M sup |e’ — 1] + 5 


xel 


<E for small enough 6 = d(e). 


Thus the h„ are equicontinuous on R. But equicontinuity and pointwise con- 
vergence imply uniform convergence on compact sets (see Ash, 1993, 7.4, 
Problem 3); hence h, —> h uniformly on Z. O 


Problems 


1. If X has density (1/7)(1 — cos x)/x?, the characteristic function of X is 
h(u) = 1 — |ul, |u| < 1; h(u) = 0, |u| > 1. (See Ash, 1970, p. 166, Prob- 
lem 8.) It is possible to construct a different characteristic function f that 
agrees with h on [—1, 1], as follows. 

Show that the Fourier series expansion on the interval [—1, 1] of f(u) 
= ] — |u| (extended periodically) is 


1 = 2 | 
fu) = 7 + ` mOn +1) exp(i(2n + lru). 


n=- ce 
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[Since f is continuous and of bounded variation, and f(—1) = /f(1), the 
Fourier series converges uniformly to f; see Titchmarsh (1939, p. 410).] 
Thus if X is a discrete random variable with P{X = 0] = 7 
P{X = (2n + Ln} = 2m? (2n + 1)’, n an integer, then X has charac- 
teristic function f. Since f is periodic, we have f Æ h, as desired. 


Using Problem 1, give examples to show that the following results are 
possible (the h’s are characteristic functions of random variables). 


(a) hih = hih does not imply hy = h3. 

(b) h, — h on [—1, 1] does not imply hn — h everywhere. 

Give an example of a sequence of characteristic funclions converging 
pointwise to a function that is not a characteristic function. 


(a) If {a,} is a sequence of complex numbers and exp(iua,,) converges 
to a (finite) limit g(u) for almost all u in the open interval Z C R, 
show that {a, } converges. 

(b) IfX, =n, so that h (u) = e’"™“,n = 1,2,..., show that the sequence 
of characteristic functions h, has no pointwise convergent subse- 
quence, Thus the corresponding sequence of distribution functions 
has no weakly convergent subsequence, as was verified directly in 


7 2.2. 
Let Fo, Fi, F2, ... be distribution functions on R, and assume F,,(—co) 
=0,n = 0, 1,..., Fa(œ@) = 1,n =1,2,.... Give examples to show that 


the following situations are possible. 


(a) F,() — Fo(x) for all x e R at which F is continuous, but F, does 
not converge weakly to Fo. 

(b) F,(b) — F,(a) — Fob) — Fola) for all a, b € R at which Fo is con- 
tinuous, but lim, oo Fn (x) does not exist for any x € R. In particular, 
F,, does not converge weakly to Fo. 


Let Fo, Fi, Fo,... be distribution functions on R, and assume that 
F,,(00) — F,,(—00o) < M < oo for all n, and F,,(a, b] — Fo(a, b] for all 
a,b e R at which Fo is continuous. Show that F, converges weakly to 
Fo iff {F,,} 1s tight. 


Let Fo, Fi, Fo,... be distribution functions on R, and assume that 
F,, (co) — F,(—c) < M < œ for all n. Show that F, converges weakly 
to Fo iff F,,(a, b] — Fo(a, b] for all a, b € R at which Fo is continuous, 
and F, (co) — F,,(—0o) — Folco) — Fo(—oe). 

Let Yı, Y2, ... be independent random variables, and let 7, be the o-field 
F(Y\,...,¥n),n =1,2,... If hy is the characteristic function of Y} and 
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u is a fixed real number such that h (u) Æ 0, k = 1,2,..., define 


n 


l 
Xn = 1] Ie (uy exp(iuY;,). 


k=l 


(a) Show that E(Xn41| Fn) = Xn a.e., and hence {Xn, Fn} 1s a complex- 
valued martingale. (In other words, {Re Xn, Zn} and {Im X,, n} are 
martingales.) 


(b) Assume that 5>,_, Yk —Ž, X. Show that there is an open inter- 
val J C R, with 0 € Z, such that for each u € J, expliu $2; Y(@)l 
converges for almost every w. 

(c) Under the hypothesis of part (b), show that for almost every a, 
exp[iu X`}; Y¥,(w)] converges for almost every u € I (Lebesgue 
measure). Thus by Problem 4, >>), Y,(@) converges a.e. to a finite 
limit. 

(d) Conclude that for a series of independent random variables, conver- 
gence in distribution, convergence in probability, and convergence 
almost everywhere are equivalent. 


7.3 CONVERGENCE TO A NORMAL DISTRIBUTION 

Let X,, X2, ... be independent random variables, with each X; having fi- 
nite mean mg and finite variance o7. Let S, = S.7_,Xz,n =1,2,...; then 
E(Sn) = $i; m, Var S, = =y] Or. We consider the normalized sum 
Ta =c,'(S, — E(S,)), which has mean 0 and variance 1. [To avoid degen- 
eracy, we assume that c, > 0 for sufficiently large n. In fact the Lindeberg 
hypothesis (7.3.1) will force c, to approach œo as n — ov.] 

If X* is a random variable having the normal distribution with mean O and 
variance 1, so that the distribution function of X* is 


F*(x) = zl. exp -3° dt, 


we ask for conditions under which T,, converges in distribution to X*. 

A long series of preliminary results were derived before a satisfactory solu- 
tion was obtained, giving conditions that are sufficient and “almost” necessary 
for convergence to X*, We consider sufficiency first. 


7.3.1 Lindeberg’s Theorem. Let S, =X, +---+X,,n = 1,2,..., where 
the X, are independent random variables with finite mean m; and finite 
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variance OF. Let T, = cl (S, — E(S,,)), where c? = Var S, = J] oj, and 
let F; be the distribution function of X,. If for every £ > 0, 


l n 
— J (x — m) adFi{x) > 0 as n—-> œ, 


Ch tay Jie em |2ece} 


then T, converges in distribution to a random variable X* that is normal with 
mean 0 and variance 1. 


Before proving the theorem, we examine some of its implications. 


d 
Lindeberg’s theorem implies that T, ——~> X* under any one of the following 
conditions. 
1. The uniformly bounded case. Assume |X;| < M for all k, andc, — oo. 
Then 


| : (x — my)? dF; (x) = E(X; — my) 10x, —mi>ec,) | 
x: xm ZEC} 


< (2M) P{|Xx — m| > Ecn} 


< (2M Yo? 
by Chebyshev’s inequality. Thus 
| g 2M Y 
— J (x — m) dFi.(x) < (2M) — 0, 
c? k=] {x: [x—my |ZEca} e ci 


2. The identically distributed case. Assume that the X; are id, with finite 
mean m and finite variance o* > 0. If F is the distribution function of the X;, 
then 


35 (x — my)? dF, (x) 


ce p=] YP: xm |ZEca) 
1 n 
= —=>| (x — my dF(x) 
no k=] {x: |x—m|>eo/n} 


1 
= (x — mÝ dF(x) 
O” J{x: |x—m|ze0/n) 


—> 0 since ø% is finite and {x: |x—m|>eo0/n} | Ø as n> oo 
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3. The Bernoulli case. Let S, be the number of successes in n Bernoulli 
trials, with probability of success p on a given trial, We may write. 


Sn =X, t+- +X, 


where the X; are independent and P{X; = 1} = p, P{xX, =O} =q=1- p. 
(We may take X, as the indicator of a success on trial k.) Thus case 2 
applies, with m = E(X;,) = p, o? = E(X;) — [E(X,) P = p(1 — p), E(Sn) = 
nm = np, c? = no? = np(1 — p). Thus 


= Sn — np 
(apq)? 


d 
and T, ——> X*, that is, P{T, < x} — F*(x) for all x. 


— 


4. Lyapunov’s condition, Assume that 


jl n 
SG NO E[IX: — m| t] > 0 for some ô > 0. 
c 


n k=1 
Then 
5 ” 245 
BUX, = ml] = | x — myl?* dF ex) 
—00 
> J Ix — my]? |x — my|? dF; (x) 
{x: [cmg |Z ec, } 
> °C? J (x — m) dF,(x). 
{x: |x—m |= €cp } 
Thus 
le > 1 < E[lX: — m|*+°] 
2 (x — my) dF) < > 4 5 
Ch k=] fx: |x—my |= Een} Ch k=] E Ch 


O Ega ELX: — m|] 
o 8 n248 
ECT 


— Q. 
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PROOF OF THEOREM 7.3.1. We may assume without loss of generality that all 
m; = 0. For if we have proved the theorem under this restriction, let 


X; = X; — m (so that EX, =0) 


S,’ — DA 
k=l 


Sa —ESn' Sn — ES, 


Cn 


T, = =T,. 


Cn 


= ERX: xs] 


-| x’ dF,’ (x), 
{x: |x|>ec,’} 


the Lindeberg hypothesis applies to the random variables Xx; hence 
T, ———> X*. But then T, ——— X*, as desired. 

The following estimates will be needed. If y is any real number and z a 
complex number with |z| < 1 


Oy? 
=t, (1) 
y 8 yÈ 
e =l +iy -+v 2 
+ ly > +e (2) 
where @ and 6, depend on y, and |@| < 1, |@,| < 1; 
Log(1+z)=z+é@'|z/’, (3) 


where “Log” denotes the principal branch of the logarithm and |F| < 1, @’ 
depending on z. (These formulas are exercises in calculus; for full details see 
Ash, 1970, p. 173.) 

Throughout the proof, hg will denote the characteristic function of Xz, and 
u will be a fixed real number. The characteristic function of T, = S,,/c, 18 


hr, (u) = E(e™"") = Efer’) = hs, (=) = | [7 (=) © 4 
Cn k=l Cn 
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By (1) and (2), 


h; (u) = [ e™ dF, (x) 


—OO 


Q 2~2 
=j (1+ iux + tA ) dF, (x) 
|x| Cy 2 


22 a 3 3 
+ | (1 + tux — ee] dF, (x), 
|x| <Ecn 2 6 


Al, |@,| < 1,8, 8; depending on ux. Since 


J iux dF,(x) = iuEX, = Q, 


— Oo 


we may drop the terms involving iux. Thus 


H u? > 
h; — |=] + AD Ox AF; (x) 
n 2c, |x| >En 
w ano ME S 0i lx? dFy(x). (5) 
— — X k(x —z 1 |x k). 
2c? |x|<€c, 6c |x| <€c, 
Now ; j 
5J Ox” dF, (x) < 5S x* dF (x); 
2 x> Etn 2 x| > ECn 
hence 
1 


ho | — 


~ J Ax” dF; (x) = bə J x? dF,(x), where || < 
|x| > ECp 


2 x|>ecn 


Similarly, 


1 
<- J |x? dF; (x) 
6 |x| <&cp, 


r 
<- | En el? dF, (x) 
| 


6 x|<€c», |x| 


1 
F J 0i |x]? dF; (x) 
Ix|<€c, 


(note that |x| < éc, implies éc,,/|x| > 1). Hence 


On| — 


1 
L f 0 |x|? dF; (x) = 6s | ECnX? dF; (x), [93| < 
|x| <Ech 


6 |x| <éc, 


312 7 THE CENTRAL LIMIT THEOREM 


Thus if we set 


1 
Onk = 5 x’ dF, (x), 


Hi |x| => Ecn 


1 
Bnk = => x? dF(x), (< &°), 


n Y |x| <Ecn 


(5) becomes 


h; (=) = 1 + Yue. (6) 


where 


2 
u 
Ynk = Ozu dnk — Buk + lul E83 Bnk- (7) 
Now `i- @nx —> 0 by hypothesis, and 


n 1 2 
Sank + Buk) — 2 o =1. 
n k 


k=l 


—_<, 


Thus by (7) and the fact that Bng < e*, we have 
2.2 2 
ue 7 
max [yn] < = + lue and yn < > + luje 
k 


for sufficiently large n. By (3), 


uz 


5 + \ Log(1 + Ynk) = 


9 n 
u li f 
+ > On +O Ya) 1 < 1. 
k=1 2 k= 1 


Now 


NO IYn? < (max Iyaa) Syn: (8) 
k=l 


k=l 


and it follows from (8) that if ô > O is given, and € is chosen sufficiently small 
(depending on ô and u), then 


u? 


— Log y 
5 > og(1 + Ynk) 
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will be less than 6 for sufficiently large n. Since the exponential function is 
continuous, we obtain from (6) that 


u? N q H 
(E) 
2 ai Cn 


in other words [see (4)], Ar (u) —> exp(—u?/2), the characteristic function of 
X*. By 7.2.9, T, ——> X*. O 


There is another proof of the Lindeberg theorem that uses weak convergence 
directly, rather than via Lévy’s theorem; see Billingsley (1968, p. 42). 

We now show that the Lindeberg hypothesis is not necessary for conver- 
gence to X*. If the Lindeberg condition holds, we claim that for all € > Q, 


Xk — mk 
p Aem g —> 0 as n> œ, 
Cn 


uniformly in k. This is referred to as uniform asymptotic negligibility (uan) of 
the random variables (X, — m,)/c,. For 


1 fi 
=> / (x — my)? dF; (x) 


2 
Ca k=] {x: |x—my| ZEcn} 


n 
> € X PIX, — m| = Ecn} 
k=l 


> e* max P{|X; — m| > €¢y}. 


l<k<n 
Thus, intuitively, the contribution of each (X; — m,)/c, is small relative to 


the sum (Sn — ES,)/¢n. 
If we can construct an example of a sequence that is not uan but for 


which T, —, X*, we then have convergence to X* without the Lindeberg 
condition; here is one possibility. Let X1, X2,... be independent, normally 
distributed random variables with zero mean. Let oO; — 2k-2 k > 2: o? = |. 
Then c4 = $`; 07 = 2”"'; hence X,,/cy is normal with mean 0 and variance 


! Therefore 
X 
max P { >of > Pf > ef. 


2 
Cn 
which is a positive constant not depending on n. Thus the X;/c, are not uan, 


d 
although T, ——> X*. [In fact T,, is normal (0, 1) for all n.] 
If we impose the uan requirement, the Lindeberg condition becomes neces- 
sary and sufficient for convergence to X*. 


Cn 
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7.3.2 Theorem. Let Xi, X2,...be independent random variables, with each 
X, having finite mean m, and finite variance o7. Then the Lindeberg condition 


l fi 
= J (x — m) dF, (x) > 0 for all E>O0 


2 
Ch k=] {x: |x—my|>ECp} 


d 
holds if and only if T, ——> X* and the (X; — m,)/c, are uan. 


Proor. The “only if” part follows from 7.3.1 and the above remarks. 
The “if” part, due to Feller (1950), is rather lengthy, and is proved in Ap- 
pendix 3. LJ 


In the Lindeberg theorem, we have normalized the sum S,, in a special way, 
that is, we have considered a sequence of random variables a7! (Sn — b,n), 
where b, is the mean and a, the standard deviation of S,. We might ask 
whether a different choice of constants a, and b, would produce different 
results, for example, convergence to a nonnormal random variable. Questions 
of this nature may be handled by the “theorem on convergence of types,” 
which we now develop. 


7.3.3 Definition. Two random variables X and Y (or their distribution func- 
tions G and F) are said to be of the same positive type iff X and a`! (Y — 
b) have the same distribution for some real a and b, a > Q, that is, G(x) 
= F(ax +b) for all x; X and Y are of the same type iff X and a~'(Y — b) 
have the same distribution for some real a, b with a Æ 0, not necessarily pos- 
itive. (The notation X £ y will indicate that X and Y have the same distribu- 
tion.) 

The notion of type is preserved under convergence in distribution, as the 
following basic theorem shows. 


7.3.4 Convergence of Types Theorem. (a) Let X, “, X, Y, Z, Y, 
where for each n, X, and Y, are of the same positive type, with X, £ a7! 
(Y, — bn), a, > 0. Assume X and Y are nondegenerate, that is, not a.e. con- 
stant. Then there are real numbers a and b, with a > O0, such that a, — a, 


b, — b, and x£ a`! (Y — b); thus X and Y are of the same positive type. 


(b) If X, f, X, Y, £, Y, where for each n, X,, and Y,, are of the 


same type, with X,, 4 aa! (Y, — by), A, #0, and X and Y are nondegenerate, 
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then there are real numbers a and b, with a Æ 0, such that |a,| — |a| and 
XÆ a`! (Y — b); thus X and Y are of the same type. 


The proof is an intricate exercise in real analysis, and is done in Appendix 4. 


We can now consider the question raised earlier about the normalizing 


d d 
constants. If (a,,’)~'(S, — b,’) —> Y, and a7! (S, — ba) —> X, where Y 
is normal and X is nondegenerate, then X must be normal. For 


n 


a(S, _ by) — —* (an!) (Sr _ bn) ~ (an) (bn _ b, J]; 


hence by the convergence of types theorem, X and a`! (Y — b) have the same 
distribution for some a, b, with a Æ 0. But a brief computation then shows 
that X is normal. (To do this, look at characteristic functions, or use the 
technique of 4.9.4.) Thus if one set of constants produces a normal limit, all 
nondegenerate limits are normal. 

The restriction that X be nondegenerate is necessary. For given any real 
number c, it is always possible to choose the constants a, and b, so that 


d 
a7! (Sn — b,) —— c (see Problem 1). 
There is a tricky aspect of Theorem 7.3.2 that is worth mentioning. Un- 
der the uan hypothesis, the Lindeberg condition is necessary and sufficient 


for T, $, X*; thus if the cy! (Xy — m,) are uan and the Lindeberg condi- 
tion fails, cy '(§,, — E(S,)) cannot converge in distribution to a normal (0, 1) 
random variable. However, it is possible for c7! (S, — E(S,,)) to converge in 
distribution to a random variable X that is not degenerate and not normal (0, 
1). In Problem 4, an example is given in which X is normal with a variance 
unequal to 1. 

In fact there are examples of independent (but not identically distributed) 
random variables X,,X>,..., each with finite mean and variance, such that 


d 
for some constants a, b, we have a7! (Sa — ba) —> X, where X is not de- 
generate and not normal. The construction of such examples is quite elaborate, 
and we give the reference only: Gnedenko and Kolmogorov (1954, p. 152). 


Problems 


I. Let Xi, X2,...be an arbitrary sequence of random variables. Given any 
real number c, show that constants a, and b,(a, > 0) can always be 
chosen so that a7! (X 1 +::-+X, — bn) converges in distribution to a 
random variable X = c. 
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2. Give an example of sequences of random variables {X,,} and {Y,} such 
d 

that X,, £ a`! (Y, — b„) for real numbers an and b,,(a, 4 0), X, —— X, 

Y, 5, Y (so that X and Y are of the same type), but |b,,| has no limit. 


3. Let {Xn n =1,2,..., k=1,..., n} be a double sequence of random 
variables, and let h,,, be the characteristic function of X,,;. Show that the 
X „g are uan; (in other words, 


pmax P{|X nkl > €} —> 0 


as n —> oo for every £ > 0) iff 


max Anra — 1] > 0 


as n — oo, and in this case, the convergence is uniform on any bounded 
interval. (Use 7.2.7 in the “if” part.) 


4. Let X,, X2,...be independent random variables, defined as follows: 


Xı = +1 with equal probability. 
If k > 1, and c 1s a fixed real number greater than 1, 


1 
P{X; = 1} = P(X, = -T = —, 
2c 
P{X; =k} = P{X k} = l 1 l 
ny 2 cj’ 
1 1 1 
P(X, =0} =1--—-—(1-- 
iX: = O} 7 2 ( =) 
Define 
y Xk if IX;| < yn, 
nk ~ 10 if o IX > syn 


Establish the following: 


(a) The X,/c, satisfy the uan condition. 
(b) The Lindeberg condition fails for the X;, but holds for the X,,’. 


Furthermore, if Sa’ = $i- Xn’, then S,"/c,’ $, normal (0, 1), 
where (c,’)* = Var Sp’ ~ n/c. 
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(c) If S, = $- Xx, then P{S, 4 S,'} ~ 0 as n > ov. 
d 
(d) JYcS,/./m ——> normal (0, 1), but S,,/./n 4— normal (0, 1). 


7.4 STABLE DISTRIBUTIONS 

If X1, X2, ...are independent, identically distributed random variables, with 
finite mean m and finite variance oł, we know from the previous section 
that (S, —nm)/o./n “, X* normal (0, 1); hence any limiting distribution 
of a sequence al (Sy, — b,) must be normal. If we drop the finite variance 
requirement, it is possible to obtain a nonnormal limit. For example, let 
the X; have the Cauchy density f(x) = 6/m(x* + 6°), x € R, 8 a fixed posi- 
tive constant. The corresponding characteristic function is h(u) = e~°'™! (see 
Ash, 1970, p. 161, for the computation), Therefore 5, has characteristic func- 
tion [h(u)]" = e~"*!; hence n-'S, has characteristic function [h(u/n)]”" = 


d 
e°l4l| Thus n—!S,,-—~> X, where X has the Cauchy density with parameter 
0. Since E(|X|) = œ, this does not contradict the previous results. 
The following investigation is suggested. Let X1, X2, ... be 11d random vari- 


ables. If ay! (Sna — bn) £, X, what are the possible distributions of X? (We 
may assume that a, > QO; for if negative a, are allowed, we consider the 
two subsequences corresponding to a, > 0 and a, < Q.) In fact the possible 
limiting distributions may be completely characterized, as follows: 


7.4.1 Definition. A random variable X (or its distribution function F, or its 
characteristic function /) is said to be stable iff, whenever X,,..., Xp are id 
random variables with distribution function F, then S$, = X1 +---+X, 1s of 
the same positive type as X; in other words, X £ a7! (Sn — bn), or equivalently 
[A(u)]* = explib, ujh(a,u), u € R, for appropriate a, > O and bn. 

A sequence {a;! (Sn — bn)}, where a, > 0 and S, = X`}; Xx, the X; iid, 
is called a sequence of normed sums. 


7.4.2 Theorem. The random variable X is stable iff there is a sequence of 
normed sums converging in distribution to X. 


Proor. If X is stable, let X,,...,X, be iid with X,=X. Then a7! (Sn — 
—] 


n 


d 
b,) ——> X for appropriate a,>0O and b„,; in particular, a 
d 
(Sn _ bn) —> X. 
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e , d , 
Assume X1, X2, ...iid, with V, = a7! (Sa — bn) ——> X. If X is degenerate, 
it is stable, so assume X nondegenerate. Fix the positive integer r, and define 


sP = X] +---+Xy, 
SP = Xn + Xan, 


so — X (r—1)n41 +e + Xin. 


Then let 
(d) _ (2) (r) 
wo — Sp Pn Sn Pn Sn On — 74. ZO. 
n an dn Ay n n 
where Z(®,...,Z® are independent. Now Z® 4 Z for all i; hence 
~o d d 
Z) —— X for each i. It follows from 7.2.9 that WO —> Z,4+---+2Z,, 
where Z1,...,Z, are 1d with Z£ X. But we may also write 
wo — X1 +--+ +Xjp — rb, _ drn (5 +-+++Xpy — byn 4 by» — rbn 
n An An Arn ay, 


— a” Vin + BP, 
where «P = a,,/a, > 0. To summarize: 


wo — p” d d 
Vin = ——y Vin — X, WO —+ Zit +Z, 
On 


By 7.3.4(a), a” approaches a limit œ, > 0, 8” approaches a limit B,, and 


x= (a,)'(Z, +---+Z, — B,). Thus X is stable. D 
In the above proof, we must have a@,, = a,a, for all positive integers r and 
s. For 7 
oe 78) — drsn — Grn rsn — Pa 
dn An Arn 


(s) 


rn? 


and we may let n —> œœ to obtain a,, = @,as. 
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7.4.3 Examples. It can be shown [see, for example, Breiman (1968, p. 204)] 
that X is stable iff its characteristic function A can be expressed as h = e£, 
where g has one of the following two forms: 


g(u) = iup — d |u|“ ( + i0— tan —o 
uj 2 
O<a <1 or l<a<2, BeR, d>0, |A| <1, (1) 


2 
g(u) = iuf — dlu| (1 + i0 “Inu , (2) 
|u| 7 


with £, d, 8 as in (1). (Take u/|u| = 0 when u = 0.) 

We shall prove only that a random variable with such a characteristic func- 
tion must be stable. If X,,...,X, are iid, with each X; having characteristic 
function A = ef, with g of the form (1) or (2), let A = 1/q@ in case (1), A = 1 
in case (2). Then in case (1), 


g(n*u) = iun* B — dn|u|” [ + io— tan al = ng(u) — iup(n — n*), 


u| 2 


and in case (2), 


2 
g(n*u) = g(nu) = tunB — dn |u| [ + io inn + In |u|) 
u 


2 
= ng(u) — iud@—n lnn. 
1 
Thus in either case, 


[h(u)]|” = exp[ne(u)] = exp[e(n*x)] explib u] = h(n*u) exp[ib, u], 


where b, = B(n — n*) in case (1), and b, = d8(2/7)n In n in case (2). There- 
fore $, 2 a,X +b,, with a, = nò =n'/*; in particular, X is stable. 

Now X has a symmetric distribution (P{X € B} = P{—X € B} for all 
Be #(R)) iff h is real-valued [see 7.1.5(d)], so the general form of the 
symmetric stable characteristic function is 


h(u) = exp[—d|u|*], d > 0, O<a <2. (3) 


When d = 0, we have X = 0. [Similarly, if d = 0 in (1) or (2), then X = £.] 
Thus assume d > 0. When a = 2, X is normal (0, 2d); when œ = 1, X has the 
Cauchy density with parameter d. 
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If X is stable (not necessarily symmetric) and 0 < a < 1, then A is not 
differentiable at u = 0, so by 7.1.5(e), E(|X|) = oo. In the symmetric case, 


E(X) does not exist. For x —X: hence y+ (—X)* = X`, and therefore 
E(X*) = E(X~), necessarily infinite. If 1 < œ <2, h can be differentiated 
once but not twice at u = 0, so that E(X?) = oo. This is to be expected, for if 
X has finite mean and variance, the fact that X can be obtained as a limit of 
a sequence of normed sums implies that X must be normal (see the opening 
paragraph of this section). 

It can be shown (see Feller, 1966, p. 215) that if X is stable, X has a finite 
rth moment for all r € (0, œ). 


Problem 


l. The following problem shows that the functions h(u) = exp[—d|u|*], 
d > 0,0 <a < 2, are characteristic functions (see 7.4.3). Let X,,...,X,, 
be independent random variables, each uniformly distributed between —n 
and +n. Define 


“. sen X; l 
Y, =k - k>0, r> x. 
Í 32 X; 2 
If the X; are the positions of masses distributed at random on [—n, n], 
then Y„ is the gravitational force at the origin, assuming an inverse rth 
power law. 

(a) Show that the characteristic function of Y,, is 


h,(u) = (1 -> fo - cosku] de — s(n} ) , 
0 


where g(n) > 0 as n > œ. 
(b) Show that h,(u) > h(u)=exp(— fu —cos(kux ")] dx) asn— oo. 
(c) Make the change of variable y = |u|! 7k! x7! to show that A(u) 
is of the form exp[—d|u|*], d > 0, 0<a <2. (The case a@=2 
corresponds to a normal distribution, and d = 0 to a degenerate 
distribution, so these characteristic functions are automatically re- 
alizable.) 


7.5 INFINITELY DIvisisLE DISTRIBUTIONS 
There are limit laws that do not fit into any of the categories we have con- 
sidered so far. For example, let T,, be the number of successes in n Bernoulli 
trials, with probability p, of success on a given trial. Then T, has the binomial 
distribution: 
n 


PIT, =k} = (p) Pa = py, k=0,1,...,n. 
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If we let n > œ, pa — 0, with np, — 2, then 


ony k 

P{T, = k} > TE k=0,1,... 
(see Ash, 1970, p. 95, for details). A discrete random variable X with P{X = 
k} = e™^Aă*/k!, k =0,1,..., is said to have the Poisson distribution. In this 


Case, 


k k 
Fr (k) = P{T, < k} = > P{T, = j} > > PIX = j} 
=) 


j=0 
= P{X < k} = Fy(k); 


hence T, $, X. 


Now this can be regarded as a limit law for sums of independent ran- 
dom variables. We may represent T, as Xni + Xn2+---+Xnn, where X,,;, 
the number of successes on trial 7, or equivalently, the indicator of the event 
{successes on trial i}, is 1 with probability p, and O with probability 
l — pp, and the X,; are independent. The difference between this case and 
the previous ones is that we are no longer dealing with a single sequence 
of random variables; 7, 1s not simply X,; +---+X,, where X1, X2,...are 
independent. Instead, for each n we have a different sequence X,,1,..., Xpnn.- 

We may construct a model that includes this case as well as all previous 
results, as follows. Consider a triangular array: 


X11 
X72, X22 
X31 X32 X33 
Xn! Xn2 X n3 n. X nn 
We assume that for each n, X,,..., Xn, are independent. (We say nothing 


as yet about any relation between rows.) We set Ta = Xn) +--+ Xnn; we 
want to investigate convergence in distribution of the sequence {T,, }. 

Notice that if we are interested in sequences {a7 ! (S, — b,)}, where S, is 
the sum of independent random variables X,,...,X,, we may construct an 
appropriate triangular array; take 


then 
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Thus the triangular array scheme includes the previous models we have con- 
sidered. 

Note also that the Lindeberg theorem holds for triangular arrays. If the X,,; 
have finite mean m,, and finite variance 7, 


n n 
c? = Var (Ern) = N oh, 
k=1 k=l 
and for every £ > 0, 


] n 
c2 > | (x — mn) dF n(x) > 0 as n —> œ, 
n k=l fx: |x— Mnk |ZEcn} 


then 


- d 
cn Y (Xank — mn) —> X* normal (0, 1). 
k=l 


The proof is the same as in 7.3.1, with the distribution function F} replaced 
by F nk- 

A natural question is the characterization of the possible limiting distri- 
butions of a triangular array; this problem was solved for normed sums in 
Section 7.4. However, as it stands, the question is not sensible, even 1f we 
require that the triangular array come from a single sequence of random vari- 
ables. Let X be an arbitrary random variable, and take X, = X, X, =O for 
n > 2, b, =0, a, = 1. Then a7! (S, — bn) = X, so any limit distribution is 
possible. Thus some restriction must be imposed. 

One way to take care of this difficulty is to assume the hypothesis of uni- 
form asymptotic negligibility, as we did in considering the converse of the 
Lindeberg theorem: 


max P{|X,;| > £} > 0 as n — œ for every E> 0. 
ISIEN 
However, we will sacrifice generality for simplicity, and assume that for each 
nN, Xn1,Xn2,---,Xnn are identically distributed. We may then characterize the 
possible limiting distributions. 


7.5.1 Definition. A random variable X (or its distribution function F, or 
its characteristic function A) is said to be infinitely divisible 1ff for each n, X 
has the same distribution as the sum of n independent, identically distributed 
random variables. In other words, for each n, we may write A = (An )”, where 
h,, 18 the characteristic function of a random variable. 
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7.5.2 Theorem. The random variable X is infinitely divisible iff there 

is a triangular array, with X,),...,Xn, ud for each n, such that 
n d 

Ta = pn) Xnk —> X. 


Proor. Let X be infinitely divisible. For each n, we may write X £x nl 


+---+Xnn, where the X,; are iid. Then T, = $, Xni =X; hence T, 
5, X. 


The converse is another application of Prokhorov’s weak compactness the- 
orem. Assume we have a triangular array with the X,, iid for each n and 


d . OP 
I, — X. Fix the positive integer r; then 
Tn — Z") se ZY) 
where 


l 
z) = Xrn | +e +X;n.n, 
2 
ZO) = X rn n41 +t + Xr, 2n, 


Zo — X mn, -n+ tet: +X rn,- 


d 
Since T,, ——> X as n — œ, it follows that {7,,,n = 1, 2,...} is relatively 
compact. (This means that the associated sequence of distribution functions 1s 
relatively compact.) By 7.2.4, {T,,,n = 1,2, ...} is tight. But 


(P{Z™ >z) = P{Z™ > Zy., Zz > z} 
by independence of the Z® 
< P{T >, > rz} 


and similarly, 
(PIZ < =z} < P{T mn < —rz}. 


It follows that {Z?,n = 1,2, ...} is tight, and hence relatively compact by 

7.2.4. Thus we have a subsequence {Z“),n = n1, n2, ...} converging in 

distribution to a random variable Y. But the Z®, į = 1,...,r, are iid; hence 
| d 

{Zn =n ,n2,...} —> Y. By 7.2.9, T, £, Y,;+---+Y,, where 


Y\,...,¥, are iid with Y;Ż Y. But Tm ——>X; hence X£ Y+- 
+Y,. 0 
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It can be shown (Gnedenko and Kolmogorov, 1954) that Theorem 7.5.1 
still holds if the condition that for each n, the X,; have the same distribution, 
is replaced by the uan condition. 


7.5.3 Examples of Infinitely Divisible Random Variables. (a) Every stable 
random variable is infinitely divisible. This may be seen from the fact that 
every stable X is a limit in distribution of a sequence of normed sums, hence 
a limit of row sums of a triangular array in which the X,;, i = 1, 2,...,n, 


have the same distribution. Alternatively, if X; +---+X, £ an X + b,, then 
d ‘ Xi 
X= — — , 
2, f ) 


i=] 
(b) A random variable of the Poisson type is infinitely divisible. Let Y 
have the Poisson distribution: P{Y = k} = oe" DE IKI, k =Q0,1,.... The char- 
acteristic function of Y is 


bn 
NAn 


00 iuyk 
h(u)=e"* `> Qe) = exp[A(e” — 1)] 
k=0 


k! 
and it follows that if Y1, ..., Y, are independent, with Y; Poisson with param- 
eter Aj, i = 1,..., n, then Yı +---+Y, is Poisson with parameter A, + --- 


+ ,. In particular, if Y is Poisson with parameter A, then Y £y I+: +Y,, 
where the Y; are iid, each Poisson with parameter A/n. Thus Y is infinitely 
divisible. Now if Y is infinitely divisible, so is aY + b (a similar statement 
holds for stable random variables); hence a random variable of the Poisson type 
(aY +b, Y Poisson, a Æ 0) is infinitely divisible. The characteristic function 
of aY + b is exp[ibu + A(e’™ — 1)]. 

(c) A random variable with the gamma distribution is infinitely divisible. 
Let X have density 


x2! e7*/P -0 
fx) =< Tp “=” 


0, x <0, 


where a, 6 > 0. The characteristic function of X is 


hu) = (1 — ipu) | — [C1 — i buy T"; 
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hence X is the sum of n independent gamma-distributed random variables 
with parameters a/n and £$. 


We now develop some general properties of infinitely divisible distributions. 


7.5.4 Theorem. If h; and hz are infinitely divisible characteristic functions, 
so is h, hz. If h is infinitely divisible, then A, the complex conjugate of h, and 
|A|? are infinitely divisible. 


Proor. If h; = (hin), i = 1,2, then kiha = (hinho,)"; since hi,hopy 18 the 
characteristic function of the sum of two independent random variables with 
characteristic functions h,, and h»,, the first assertion is proved. If X has 
characteristic function h, then —X has characteristic function A [see 7.1.5(c)]; 
thus if A = (h,)", then h = (h,,)"; hence h is infinitely divisible. Since |A|? 
= hh, |h|" is also infinitely divisible. O 


If & is the entire class of characteristic functions of random variables, the 
proof of 7.5.4 shows that if hi, ha € Z; then hiha € Z. 

Also, if h € Z, then h € ¥ and |A|? € Z. Furthermore, 7.2.8 implies that if 
hre @,n =1,2,...,andh,(u) —> h(u) for all u, where h is continuous at the 
origin, then h € Z. A similar result holds for infinitely divisible characteristic 
functions. 


7.5.5 Theorem. If h, is an infinitely divisible characteristic function for each 
n=1,2,..., and h, (u) —> Au) for all u, where h is a characteristic function, 
then h is infinitely divisible. 


Proor. Let Z, be a random variable with characteristic function h,, 
. SPS d 

n = 1,2, .... If r is a fixed positive integer, then Z, = Z{D + --- + Z0, where 

the Z®, i= 1,...,r, are iid. If Z is a random variable with characteristic 


function h, then Z,, 2, Z by 7.2.9, so that {Z,,} is relatively compact, and 
hence tight by 7.2.4. Just as in the proof of 7.5.2, it follows that {Z“} is 
tight. By 7.2.4, we have a subsequence {Z"), n = n1, n2,...} converging in 


d 
distribution to a random variable Y; hence (again as in 7.5.2) Z, —> Y] 
d 
+.---+ Y,, where Y,,..., Y, are tid with y£ Y. But Z, ——> Z; hence 
Z£Y ı +---+Y,. In other words, h is infinitely divisible. U 


Now if h is infinitely divisible, a uniqueness question arises; namely, can 
h be represented in two different ways as the nth power of a characteristic 
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function? This is actually an exercise in complex variables, as follows. Let 
f and g be continuous complex-valued functions on the connected set S, 
with f” = g"; assume that f(u) = g(u) for at least one u € S. [In our case 
S = R and f” = g” = h; since f and g are characteristic functions of random 
variables, f (0) = (0) = 1.] If f and g are never 0 on S, then f = g. Note that 
(f/2)" = 1, and therefore f/g is a continuous map of $ into {exp(i27k/n), k 
=0,1,...,n — 1}. Since the image of $ under f/g is connected, it must 
consist of a single point; thus f/g is a constant, necessarily 1 because f and 
g agree at one point. Thus the representation of h as the nth power of a 
characteristic function is unique, provided we can establish that an infinitely 
divisible characteristic function never vanishes. 


7.5.6 Theorem. If h is an infinitely divisible characteristic function, then h 
is never Ô. 


Proor. If h= (h,)", then |A|? = |A|”. Since |A|? is infinitely divisible by 
7.5.4, we may as well assume that A and the h, are real and nonnegative. Thus 
h, = h!” (= exp[(1/n)Inh]), so if h(u) > O, then A, (u) > 1, and if h(u) = O, 
then h, (uv) = 0 for all n. But A(O) = 1; hence h(u) > 0 in some neighborhood 
of the origin. Thus A, converges to a function g that is 1 in a neighborhood 
of the origin. By 7.2.8, g is a characteristic function, and hence continuous 
everywhere. But g takes on only the values 0 and 1, and hence g = 1. Thus 
for any u,h,,(u) > 1, so that A,(u) 4 0 for sufficiently large n. Therefore 
h(u) = [ha(u)}" £0. O 


Example 7.5.3(b) is basic in the sense that random variables of the Poisson 
type can be used as building blocks for arbitrary infinitely divisible random 
variables. 


7.5.7 Theorem. The random variable X is infinitely divisible iff there is a 


d 
sequence of sums yr Xnk —> X, where for each n, the X,,, are indepen- 
dent (not necessarily identically distributed) and each X,; is of the Poisson 


type. 


Proor. The “if” part follows from 7.5.3(b), the first assertion of 7.5.4, and 
7.5.5, so assume X infinitely divisible. Since h, the characteristic function of 
X, is continuous and never 0 (by 7.5.6), h has a continuous logarithm, to 
be denoted by log h. If we specify that log A(0) = log 1 = 0, the logarithm is 
determined uniquely. (See Ash, 1971, p. 49ff., for a discussion of continuous 
logarithms.) If h = (h,,)”, where h, 1s a characteristic function, then 


(Apy = exp (~ tog h) 
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hence as in the discussion before 7.5.6, 


l 
h, = exp G log a), 
n 


so for any fixed u € R, 
l 
n(h,(u) —1) =n (exp z log hou) — 1) 
n 


1 1 

=n G log h(u) +0 (=) since e = 1 +z +o(z) 
n n 

— log h(u) as n— oo. 


Thus 
Oo ‘ 
log h(u) = lim n(h (u)— 1) = lim n | (e'* — 1) dF,{x), 
noo n—> OO — 00 


where F,, is the distribution function corresponding to h,,. It follows from 
the dominated convergence theorem that for each n we may select a positive 
number m = m(n) such that m —> oo as n — œ and 


1 
< — for all U, 
n2 


L (e"™ — 1ydF (x) — [ (e' — 1) dF,(x) 


and we may then choose a positive integer r = r(n) such that 


mo ro 
J e — 1) dF,(x) — X (e — IYF, G) — Fr- Dl < < 


k=l 


for all u € (—m, m), where x, = —m + 2mk/r,k = 0, 1,...,r. 
It follows that we may obtain A(u) as a pointwise limit of terms of the form 


r(n) 
| [ explàn(exp(iangu) — 1), 
k=1 
where Àn = n| Fn (x) — Fa (x}—1)] and apk = x. L 
We conclude this section by mentioning the Lévy—Khintchine representa- 
tion: The characteristic function A is infinitely divisible iff 


22 . 2 

U O , LUX l +x 
logh(u) = iu -+ f (en i) a , 
gh(u) = ip- =+ | |e 5 ) Ao) 


where £ € R, o7 > 0, and À is a finite measure on (R) such that A{O} = 0. 
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The result is basic for a deeper study of the central limit theorem, in 
particular for deriving conditions for convergence to a particular infinitely 
divisible distribution, analogous to the results on normal convergence in 8.3. 
Full details are given by Gnedenko and Kolmogorov (1954). Proofs of the 
Lévy—Khintchine representation are also given by Chung (1968) and Tucker 
(1967). 


Problems 


1. The random variable X is said to have the geometric distribution iff 
P{X =k} = g'p,k =1,2,...,whereO < p < 1,g = 1 — p. Show that 
the associated characteristic function, given by 


pe” 
1 — ge’ 


h(u) — pe” X (q — 


k=l 


is infinitely divisible (use 7.5.7). 


2. Letg(s)=}_,n™, Re s > 1, be the Riemann zeta function. The series 
converges uniformly for Re s > 1+, any £ > 0; also, 


OO 


g(s) = II —. 


k=] Pk 


where p, 1s the nth prime. (See Ash, 1971, Chapter 6 for details.) If c is 
a fixed real number greater than 1, show that A(u) = g(c + iu)/g(c) is an 
infinitely divisible characteristic function (use 7.5.7). 

3. Give an example of an infinitely divisible characteristic function that is 
not stable. 

4. When characteristic functions are not easy to compute, the following tech- 
nique 1s sometimes useful for actually finding the distribution of a sum 
of independent random variables. 

Let X and Y be independent random variables, and let Z = X + Y. If 
X, Y, and Z have distribution functions F,, F», and F3, show that F} is 
the convolution of F, and F>2 (notation: F3 = F; x F»), that is, 


OO 
F3(z) -| F(z — y)dFXy). 
— OO 
F is also the convolution of F» and F4, that is, 


F’3(Z) =| F(z — x) dF (x). 
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If X [respectively, Y] has density fı [respectively, f2], show that Z has 
density f3, where 


fe) = f fie- ydr») = | falz — x) dF (x). 


[If both X and Y have densities, replace dF\(x) by f(x)dx and dF»>(y) 
by f2(y) dy.] 

Intuitively, the probability that X falls in (x, x + dx) is dF\(x); given 
that X = x, we have Z < z iff Y < z — x, and this happens with probabil- 
ity F(z — x). Integrate over x to obtain the total probability that Z < z, 
namely, F, = Fz x F,. The other formulas have a similar interpretation. 

Note also that convolution is associative, that is, Fı * (F> * F3) 
= (F; x F,)* F3. This is somewhat messy to prove directly, but a prob- 
abilistic interpretation makes it transparent. For if X1, X2, and X3 are 
independent random variables with distribution functions F,, F2, and F3, 
respectively, then Fı * (F2 x F3) 1s the distribution function of X; + (Xo 
+ X3), and (F; * F2) * F318 the distribution function of (X; + X2)+ X3. 

Finally, we note that if F is the distribution function of a random 
variable, then F is infinitely divisible iff for each n there is a distribution 
function F„ (of a random variable) such that F = F, x F, *---* F, (n 
times). 

5. If i > 0 and f is the characteristic function of a random variable, show 
that exp[A(f — 1)] is an infinitely divisible characteristic function. 


7.6 UNIFORM CONVERGENCE IN THE CENTRAL Limit THEOREM 
Let X1, X2, . .. be independent random variables with finite mean and vari- 


ance, and suppose that T, = col (Sn — E(S,,)) 5, X* normal (0, 1). Very 
often the statement is made that “for large n, S„ 1s approximately normal with 


mean a, = E(S„) and variance c2,” that is, 


7 1 e 
F y- | exp | —————__ |} dt - 0 as n — oo. 


Let us try to prove this. If X is normal (a,, c$) and X* is normal (0, 1), 
then 


PIS, <2) = PIX sx) = |P $T, <e, phe <2 eh 
Cn 


Fr, (* = =) _ F* (— "| 
Cy Ch 
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and this will approach 0 as n —> œ if Fr, — F* uniformly on R. In fact this 
does happen; the proof rests on the following two results. 


7.6.1 Theorem. Let F, F, F2, ... be bounded distribution functions on R, 


S a dense subset of R containing all the discontinuity points of F. 
If 
F,(00) > Foo), 


F,(—00) > F(=), 
Fœ > F(x) for all xeS, 
Fi(x)—> F(x) for all XES, 
then F, —> F uniformly on R. 


Proor. Let e > 0 be given. We wish to obtain a partition —oo = yo < yı 
< +--+ < Ym = 00 with 


— E 
F(yj41) < F(yj)+ 


—, 0< j<m-—-l, 
7 js 


€ . 
F(yj+1) 2 FO) + 3 O<j<m—-2 


(take y„ = oo). 

Set yo = —oco and define zı = sup{x > yo: F(x) — F(yo) < €/3}. If zı 
<oo, then F(x)—F (yo) < €/3 for yo <x < zı; hence F(z, )<F(yo) + (€/3) 
< F(yo) + (e/2). Also, F(z) > F(yo) + (€/3); for if not, F(z") < F(yo) 
+ (€/3) for some zı’ > zı (by right-continuity), contradicting the definition 
of Zle 

Now if F(z17) < F(zı), then zı € S by hypothesis, and we set y, = zı. If 
F(z, ) = F(z), then since S is dense and F is right-continuous, we can find 
yı € § such that yı > zı and F(y,) < F(yo) + (€/2). Thus in either case we 
obtain yı € $ such that F(y, ) < F(yo) + (e/2) and F(y,) > F(yo) + (€/3). 
Continue by defining zz = sup{x > yı: F(x) — F(y,) < €/3} and proceed as 
above. Since F(y;+1) > F(y;) + (e/3) and F is bounded, the process will 
terminate in a finite number of steps and produce the desired partition. 

Let x € R; say y; <x < yj). Then 


F(x) — F(x) < Fp O71) — FQ) 
E E€ 
< F (Y1) + 5 — F(y;) 
for large n, since yj; ES or Yj+1 = OO 


<E, 
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Also, 
F(x) — Fr Œ) < F(Yj41) — Fa (yj) 


— E 
SFO) — FO) T 5 
for large n since y; € $ or yj = —00 
<E. 


The “large n” depends only on j and not on x, and it follows that F, —> F 
uniformly on R. O 


7.6.2 Theorem. Let F, Fi, F>,...be bounded distribution functions on R. 
Assume that F,,(—co) = 0 for all n, F(—oo) = 0, F is continuous everywhere, 
and F, converges weakly to F. Then F, converges to F uniformly on R. 


Proor. In 7.6.1, take $ = R. Since F, converges weakly to F and F is 
continuous everywhere, F,,(00) > F(oo) and F,(x) —> F(x) for all xe R. 
Since F,,(—0oo) > F(—oo) by assumption, it remains to show that F, (x7) 
— F(x) for all x. If y < x, 


F(y) = lim F, (y) < lim int F, (x) < lim sup F,, (x ) < lim F, x) = F (x). 
If y— x, then F(y) > F(x) by continuity of F; hence F T) > Fx) 
= F(x’). U 
Problem 


l. (Glivenko-Cantelli Theorem) Let X1, X2, ... be independent, identically 
distributed random variables with common distribution function F. Let 
F(x, œ), x € R, œ € Q, be the empirical distribution function of the X;, 
based on n trials, that 1s, 


= 1 | the number of terms among X1 (w), ...,Xn(@) 
Fn (x, œ) = n ke are <x 


| n 
= — X Is (œ). 
n k=l 


For example, if n = 3, X1 (œ) = 2, X2(w) = 5, X3(@) = 7, then 


© 
è 


‘a 


F(x, œ) = 


= IK W]e 
O E ed foe 
LA 


a 
[V 
~] x 
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Intuitively, for large n, F, (x, œ) should approximate F(x). Show that 
there is a set A of probability O such that for œ ¢ A, Fax, œ) > F(x) 
uniformly for x € R. 


7.7 THe SKOROKHOD CONSTRUCTION AND OTHER CONVERGENCE THEOREMS 

In this section we look at various results involving the interplay between 
convergence in distribution and other types of convergence. In particular, the 
Skorokhod construction frequently allows a very effective substitution of al- 
most everywhere convergence for the weaker convergence in distribution. We 
begin with a result about convergence in distribution of sums, products, and 
quotients. 


d d 
7.7.1 Slutsky’s Theorem. If X,—— X and Y, —— c (hence Y, —_, c 
by 7.1.7) then, 


d 
(a) X,+Y, ——> X+c 


(b) X,Y, —-> cX 
Xn d X , 

(c) — —> — if c £0. 
Y, c 


Proor. (a) Let F, be the distribution function of X,, G, the distribution 
function of X, + Y,, F the distribution function of X, and H the distribu- 
tion function of X +c. If X, + Y, < x and |Y, —c| < £, then X, <x- Y, 
and Y, > c— e£, so that X, <x—c+e. Similarly, if X, <x—c—e and 
|Y, — c| < £e then c > Y, — e, so that X, < x —Y,. Therefore 


Gr) < Fra — e+e) t+ P{lYn — cl > e} 


and 
F & — c — £) < Gr&) + P{IY, — c| > e}. 


Let x be a continuity point of H, so that x — c is a continuity point of F. 
Choose £ > 0 so that x — c + £ and x — c — e are continuity points of F as 
well. Then 


F(x — c — £) < liminfG, (x) < lim sup G (x) < F(x —c+e). 
h — CO n — OO 


Since £ can be as small as we wish, it follows that G, (x) — F(x — c) = H (x). 


(b) We use the same notation as in (a), with X, + Y, replaced by X,Y, 
and X + c replaced by cX. First assume that c # 0. Then we can take c > 0 
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d , . 
without loss of generality, since —Y, ——~ —c. The idea is essentially the 
same as in (a). 

If X,Y, <x and |Y, —c|l <e then X, <x/Y, and Y, >c—e, so 
that X, < —-.. Similarly, if X, < -— and |Y, — c| < £ then c +€ > Y, and 


CHES — C+E 


Xn < x/Y,. Therefore, 


Gnr) < Fn (=) + P{| Yn — c| > e} 
C— E 


and 


Fn (=) < Gr x) + P{|Yn _ c| > £}. 
C+E 
Let x be a continuity point of H, so that x/c is a continuity point of F. 


Choose £ > 0 so that x/c — £ and x/c + £ are continuity points of F as well. 
Then 


F ( ) < liminfG, (x) < tim sup G, (x) < F ( * ). 
CHE n—> oo n> C— E 


and since € can be taken as small as we wish, it follows that G, (x) 
— F(x/c) = H(x). [We have assumed that Y, > 0 in the above manipula- 
tions, and this causes no difficulty because Y, > 5c > 0 except on a set of 


small probability. A similar comment applies to the proof of (c) below.] 


If c = 0 we must show that X,Y, 5, 0. This may be accomplished by 
observing that {X < M} t Q as M —> œ, and given € > 0, we have P{|Y, | 
< ¢/M}— l as n — œ. Thus for sufficiently large n, we have |X,,Y,,| < € 
except on a set of small probability. 


(c) As in (b) we can assume without loss of generality that c > 0. We use 
the same notation as in (a) with X,, + Y,, replaced by X,„/Y„ and X + c replaced 
by X/c. If X,/Y, < x and |Y, — c| < £ then Y, < c+ £ and X, < x(c + £8). 
Similarly, 1f X,, < x(c — £) and |Y,, — c| < £ then c < Y, + £ and X, < xY,,. 
Therefore, 


Gr (x) < Fale + €)) + PilYn — cl > £) 


and 
Fy (x(c — €)) < Gr) + P{|¥, — c| = 8}. 


The argument is completed as in (b) with x/c replaced by cx, x/(c — €) by 
x(c + £) and x/(c + £) by x(c — £). O 


We now give the Skorokhod construction. 
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7.7.2 Skorokhod’s Theorem. Assume that X, _“, X, and let F, be the 
distribution function of X,, with F the distribution function of X. Then there 
are random variables Y,(n = 1,2,...) and Y on (0, 1) (with Borel sets and 
Lebesgue measure) such that Y,, has the same distribution as X,,, Y has the 
same distribution as X, and Y,(@) — Y(qw) for every œw € (0, 1). 


Proor. The idea is that since F,, converges pointwise to F except possibly at 
discontinuity points of F, the inverse of F, should converge to the inverse of 
F. Now whereas the distribution function F is increasing and right-continuous, 
for each œw € (0, 1) there will be a minimum value of x, say x = xo, such that 
F(x) > œ, and we have F(x) > æ for all x > xo, and F(x) < æ for all x < Xp. 
We define 


Y (œ) = x = min{x: F(x) > oa}; see Fig. 7.7.1. 


From the definition we have 


Figure 7.7.1. 


(1) Y(@) < x1iff F has reached height w at x or earlier iff F(x) > œ. Similarly, 
we define Y,,(@) = min{x: F,(@) > w}; then Y, (œo) < x Iff F (x) > ow. 

If à is Lebesgue measure then Af{w: Y, læ) < x} = A{w: w < F,(x)} 
= F a(x), so Y,, £ Xn, and similarly Y £ X. We will prove that 
(2) liminf,.. Y¥,(w) > Y(o). 

Let œ € (0, 1) and £ > 0. Choose a continuity point x of F such that Y(w) 
— g <x < Y(w); see Fig. 7.7.2. By (1), Y(@) > x implies that F(x) < œ, and 
since F,,(x) > F(x), we have F, (x) < æ for all sufficiently large n. Again by 
(1), Y,(@) > x eventually. Thus lim inf, Y,(@) > x, and since £ is arbitrary, 
the result follows. 
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The analogous result for limsup,_,,, ¥,(@) 1s more delicate because 
æ < F(x) is not equivalent to Y(w) < x. For example, see Fig. 7.7.1 with 
X = Xo 
(3) IY is continuous at w then lim sup, a Y,(w) < Y (œ). 


Let œw and w’ be in (0, 1) with œ < w’, and let £ > 0. Choose a continuity 
point y of F such that Y(w’) < y < Y(m’) + e (Fig. 7.7.3). Now Y(w’) < y 
implies that F(Y(w')) < F(y), and since Y (œ) is the first point at which F 
reaches height œw, we have F(Y(w’)) > w’. Thus œ < œw < F(Y(a’)) < F(y) 
(Fig. 7.7.4). 


Yw) y Y(@’) + € 
Figure 7.7.3. 


rr 


O v FO) 
Figure 7.7.4. 


But F, (y) — F (y), so for all sufficiently large n, Fa (y) > w. By (1), Y læ) 
< y< Y(æ )+e (again see Fig. 7.7.3). Consequently, 


lim sup Y, (@) < Y (œw) > Y@) as wo —> W, 
ti OO 


and the result follows. 

By definition, Y(@) increases with œ and therefore has at most countably 
many discontinuities. Since a countable set has Lebesgue measure zero, we are 
free to change the definitions of Y, and Y at points œ where Y is discontinuous. 


In particular, it is convenient to set Y(w) = Y,(w) = 0 for all n. Skorokhod’s 
theorem now follows from (2) and (3). O 


Part (c) of the next theorem gives a typical application of the Skorokhod 
construction [cf. 2.8.1(c)]. 


7.7.3 Convergence of Transformed Sequences. Let X, X1, X2, ... be ran- 
dom variables on (Q, Z, P) and let g: R —> R be continuous a.e. [Py]. Then 
(a) X, — X ae. implies g(X,) > g(X) ae. 
(b) X, ——> X implies g(X,) — g(X) 
(c) X, — X implies g(X,) —> (X). 
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Proor. (a) Suppose that X, (œw) —> X(w) for w € A, where P(A) = 1, and g 
is continuous on B, where Py(B) = 1. If œ € AN X7!(B) then X, (@) > X(w) 
and g is continuous at X(w), so g(X,(w)) > (X (œ), Since P(A N X7!(B)) 
= 1, the result follows. 

(b) If 2(X,,) fails to converge in probability to g(X), then there exist £ > 0 
and ô > 0, and a subsequence of {1, 2, ...} such that on the entire subsequence, 


P{|g(Xn) — g(X)| = £} = ô. 


We can then extract a further subsequence on which X,, —> X a.e., and conse- 
quently g(X,,) > g(X) ae. by (a). On the second subsequence we must have 


2(X,, ) , g(X), contradicting our choice of the first subsequence. 


(c) By 7.7.2, there are random variables Y, £ X, and Y £ X with Y, — Y 
a.e., and by (a), e(Y„) > (Y ) ae. By 7.1.7, 2(Y,,) £, 2(Y). But X, 2 Yn 
implies oX, £ e(Yn), and xy implies (X) e(Y). Therefore e(X,„) 
f, (X). O 


Problems 


1. IfX, — X,a, > aand b, > b, show that a,X, +b, —— aX +b. 
d 
2. IfX, —> X, show that E(|X|) < lim inf, E(\X,|). 


3. ÉX, £, X and the X, are uniformly integrable, show that X is inte- 
erable and E(X,,) —> E(X). 


7.8 Tae k-DIMENSIONAL CENTRAL Limit THEOREM 

Just as the sum of a large number of independent random variables is, 
under wide conditions, approximately normal, the sum of a large number 
of independent random vectors has, approximately, a Gaussian (also called 
multivariate normal) distribution. In this section we give a version of the 
central limit theorem for k-dimensional random vectors. 


7.8.1 Definitions and Comments. First we recall some definitions and re- 
sults from 1.4 and 2.8. On R* we have a partial ordering: x < y iff x; < y; 
for every i = 1,..., k, and x < y iff x; < y; foreveryi=1,...,k. 

If F: R* — R, we say that F is right-continuous at x iff F(x) = lim, F (un) 
for every sequence u; > uz > ---— x. If F satisfies the property that x < y 
implies F(x) < F(y), we may assume in the definition of right-continuity that 
the u,, are all strictly greater than x. 
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If F: Rf —> R, we say that F is increasing iff F(a, b] > 0 whenever 
a < b. This is not equivalent to saying that a < bimplies F(a) < F(b); neither 
condition implies the other in general. 

An increasing, right-continuous function F: R > R is called a distribution 
function on R*. If F is a distribution function, there is a Lebesgue -Stieltjes 
measure u on (R*,.4(R*)) such that u(a, b] = F(a, b] for all a, be R*. 
Conversely, if 4 is a Lebesgue -Stieltjes measure on (R*, .@(iR*)), there are 
several distribution functions F such that u(a, b] = F(a, b] for all a, b € R*. 
If u is a finite measure, it 1s easiest to use the distribution function G(x) 
= p(—00o, x]. Let us spell out in some detail the relation between u and G. 


7.8.2 Theorem. (a) Let u be a finite measure on (R*,.4(IR*)), and let 
G(x) = u(—o, x]. Then G is a distribution function such that a < b implies 
that G(a) < G(b). Furthermore, for every £ > O there is an A > 0 such that if 
x; < —A for at least one coordinate x;, then G(x) < e. 

(b) Conversely, suppose that G is a distribution function on R* with the 
property that for every ¢ > O there is an A > 0 such that if x; < —A for 
at least one j, then G(x) < £. Then there is a unique Lebesgue- Stieltjes 
measure u on (R*,.4(R*)) such that G(x) = u(—co, x] for all x. [By 
(a), we have G(a) < G(b) when a < b. The measure yp is finite iff sup, G(x) 
< .] 


Proor. (a) By 1.4.8 and the discussion following it, G is a distribution 
function, and since yz is a (nonnegative) measure, a < b implies G(a) < G(b). 
Given £ > 0, choose A > 0 such that w(R* — [—A, A]*‘) < e. If x; < —A for 
at least one j, then (—oo, x] cC R* — [—A, Al]*, so that G(x) < e. 

(b) By 1.4.9 there is a unique Lebesgue—Stieltjes measure u on 
(Rt, @(R*)) such that u(a, b] = G(a, b] when a <b. If a> —coe R“, 
that is, each coordinate of a approaches —oo, then p(a, b] > u(—co, b] by 
1.2.7(a). But G(a, b] — G(b), since every term in 1.4.8(b) goes to 0 except 
for G(b). Therefore G(b) = u(—oo, b]. O 


In one dimension, G is continuous at x iff u{x} = 0. We can formulate this 
condition in such a way that it generalizes to the higher-dimensional case, as 
follows. 


7.8.3 Theorem. Let G(x) = u(—oo, x], where u is a finite measure on 
(R*, @(R*)). Then G is continuous at x iff 4(8(—oo, x]) = 0 (where 3 stands 
for boundary). 
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Figure 7.8.1. Proof of Theorem 7.8.3. 


Proof. If G is continuous at x then p(—oo, x)=lim{G(y): y t x, y < x} 
= G(x), and it follows that 44(d(—oo, x])=0. Conversely, if 4(0(—oo, x]) = 0, 
then 


liim{G(y): y ¢ x, y < x} = lim{G(z): z | x,z > x} = G(x) 
(by night-continuity of G). 


Given ¢ > 0, there exist xı <x and x» > x such that 0 < G(x)— G(x) 
< ¢ and O < G(x) — G(x) < e (see Fig. 7.8.1). If y € [x1, x2], we have G(x) 
< G(y) < G(x2), and therefore G(y) < G(x2) < G(x) + € and G(y) > G(x) 
> G(x) — £. Thus |G(y) — G(x)| < e. O 


We now consider the hyperplanes that are perpendicular to one of the coordi- 
nate axes, that is, sets of the form {x = (x, ..., X4): x; =a}, where 1 <i<k 
and a € R. As the measure u is finite, only countably many of these hyper- 
planes can have strictly positive z-measure (See 1.2, Problem 12). If H,, is 
the set of these hyperplanes, we have the following result. 


7.8.4 Theorem. If x does not belong to any of the hyperplanes in H,,, then 
F is continuous at x. 


ProoF. This follows immediately from 7.8.3. UO 


Recall that a sequence of finite measures yz, on R* converges weakly to a 
finite measure u iff [fdu, —> ffdu for every bounded continuous f: R* > R. 

Theorem 2.8.1 gives several conditions equivalent to the definition of weak 
convergence, and as the theorem was proved for finite measures on the Borel 
sets of an arbitrary metric space, it remains valid in R*. We may also establish 
a k-dimensional analog of Theorem 2.8.4, as follows. 


7.8.5 Theorem. Let u, u, u2, ... be finite measures on (R*,.#(R*)) with 
corresponding distribution functions F, Fi, F2,..., where we take F,,(x) 
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= Un(—oo, x] and F(x) = w(—«, x]. The following conditions are equiva- 
lent: 


(a) pn converges to u weakly. 

(b) yp, (R*) > w(R*) as n > œ and F,(x) > F(x) at each continuity 
point of F in R*. 

(c) p,(R*) > w(R*) as n > œ and F, (x) > F(x) at each point x € R* 
that does not lie in any of the hyperplanes of H,. 


Proor. (a) implies (b): Take f = 1 to show that u, (Rt) > u(R*). If x is 
a continuity point of F, then w(d(—oo, x]) = 0 by 7.8.3, so F, (x) —> F(x) by 
2.9. 1(e). 

(b) implies (c): This follows from 7.8.4. 


(c) implies (a): Given £ > 0 we can find a positive number T such that 
none of the vertices of (-T,T]* lie in a hyperplane of H,,, and such that 
(Rt —(—-T,T}) <£. Then p,((-T,T]‘) > w(-T,T}) and p,(R*) 
— p(R*) by (c), and therefore u,(R* — (—T, T]*) < 2e for all sufficiently 
large n. 


If f is a bounded continuous function on R*, it is uniformly continuous 
on [—T,T]*, so using hyperplanes not in H, we can cut (—T,T]* into a 
finite number of rectangles (u, v] such that if x and y belong to the same 
rectangle then | f(x) — f(y)| < £. Thus we may approximate f on (—T, T} 
by a function g such that g is constant on each (u, v] and |f — g| < £ on 
(—T,T}]*. Then, noting that f = (f — g) + g, we have 


J fdu — fdu 


< f, „Ofldin + Ifldp) 
R -(-7.7K 


+ | (lf — g| dun +|f — gl dp) 
(-T,TÉ 


J g diin — J edu . 
(—T,7\ (-T,T} 


The right side of this inequality is less than some constant multiple of € for 
sufficiently large n, and the result follows. C 


+ 


We now begin the study of weak compactness on R*. 


7.8.6 Helly’s Theorem. Let pi, u2,... be finite measures on R* with 
corresponding distribution functions F1, F2,.... As above, we take F,,(x) 
= [Ln (—00, x]. If u, (R) < M < œ for all n, there is a distribution function 
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F on R* and a subsequence {F,} such that Fp, (x) —> F(x) at every continuity 
point x of F in R*. 


Proor. The procedure is essentially the same as in 7.2.1, but we must ex- 
ercise some care in choosing the dense set D. Let M be a dense subset of 
R, and take D = M*. Then construct Fp and F exactly as in 7.2.1. We must 
show that F is increasing, that is, F(a, b] > O when a < b. To accomplish 
this we move the vertices of (a, b] slightly to produce a new rectangle (c, d] 
with c,d € D and c > a,d > b. All the vertices of (c, d] are now in D, and 
if |c — a| and |d — b| are small enough, we have |F (a, b] — F(c, d]| < £. But 
F(c, d| = Fp(c, d] > 0, and it follows that F(a, b] > 0. The proof that F is 
right continuous and that F,, (x) > F(x) when x is a continuity point of F 
in R* is exactly the argument in 7.2.1. [As F,(x) is defined as p2,(—0, x], 
we have F,(x) < F,(y) if x< y; also, F(x) < F(y) if x< y, by con- 
struction.] C 


In 7.2.3, we gave the definitions of tightness and relative compactness of a 
family of finite measures on the Borel sets of an arbitrary metric space, and 
this definition applies in R*. Thus, tightness means that given £ > 0, there is 
a rectangle whose complement has measure less than £, uniformly throughout 
the family. Relative compactness means that every sequence from the family 
has a subsequence that converges weakly to a finite measure. Prokhorov’s 
theorem also carries over. 


7.8.7 Prokhoroy’s Theorem. Let ui, u2,... be finite measures on (RF, 
6 (R*)) with corresponding distribution functions Fi, F2,... Again we 
assume that F„(x) = ,(—00, x]. Suppose that pu, (R*) < M for all n. Then 
the sequence {un, n = 1,2, ...} is tight iff it is relatively compact. 


Proof. Relative compactness implies tightness just as in 7.2.4. To prove that 
tightness implies relative compactness, use Helly’s theorem to get a subse- 
quence {n;} and a distribution function F such that F, (x) > F(x) at each 
continuity point x of F in R*. Tightness implies that given £ > 0, there exists 
A > 0 such that y,,(R* — [—A, A]‘) < e for all n. Therefore, if x; < —A for 
at least one j and x is a continuity point of F, we have F,,(x) < e for all 
n. Now the right-continuity of F implies that if x; < —A for at least one j, 
then F(x) < £. By 7.8.2(b), there is a measure u on (R*,.@(R*)) such that 
F(x) = (—o, x] for all x, and yp is finite because sup, F(x) < M. Tightness 
also implies that 4, (R*)— w(R*), and by 7.8.5, pn, converges weakly to 
u. This proves relative compactness. C] 


7.8.8 Remarks. In Prokhorov’s theorem, if the 4, are probability measures, 
so is u (since up, (R4) > w(R*)). 
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If the sequence of finite measures u1, {42,...is tight and there is a finite 
measure u such that any weakly convergent subsequence of {u} converges 
(weakly) to u, then the entire sequence yz, converges weakly to u. For by 
Prokhorov’s theorem there is at least one weakly convergent subsequence. 

Characteristic functions can be defined in a natural way in R*. 


7.8.9 Definition. Let u be a finite measure on R*. The characteristic 
function of u is the mapping from R* to C given by 


h(u) = Jl exp(i <u, x >) d(x), ue R*, 
RE 


where <u, x> = DHR U;X j. 

If u = Py, where X is a random vector (X1,...,X}) then A(u) = 
Elexp(@i < u, X >)lļ. 

It follows that Theorem 7.1.2 can be extended to independent random 
vectors, in other words, the characteristic function of a sum of independent 
random vectors is the product of the individual characteristic functions. 

As in one dimension, the characteristic function of a finite measure 
determines the measure uniquely, because of the following result. 


7.8.10 Inversion Formula. If h is the characteristic function of a finite 
measure j1, and if A = (a, b] is a bounded rectangle in R* such that (dA) = 0, 
then 


l t Tex (—iu;a;)— exp(—iu;b;) 
(A) = lim TAAL J 1] mee ee h(u) du. 
c> (270) [—c,c]k jal Lu; 
PROOF, 
k , , 
Let le = 5 | pp |e re h(u) du 
(270)* Jt—e,c}k jal Lu; 
I [ k a een 
(270) [—c,c] j=l iU) 
k 
x eX l UpX hp dux) 
Rpa 
k 
-| J | 7x}, 4;, bj) dux) 
Rt 
j=l 
where Jel j.4),6) =z f Sin jj =a) = MMI b) y, 
27n Je H; 


J 
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As in 7.1.3 we have lim J.(x;,4;,b;) =J(x;,a;,b;), where J is O for 
x; ¢ [a;, b;], 1 for x; € (a;, bj) and 5 for x; = a; or b;. Therefore, 


lim I. = f K(x, a, b) dp, 
R* 


C00 


where K is O for x ¢[a,b] and 1 for x € (a,b). On the boundary of 
A = (a, b], K assumes values that are various powers of x It follows that 
lim J, = w(A) if w(d@A) = 0. LO 
Cc > OO 

Here is the k-dimensional version of Levy’s theorem 7.2.9. 


7.8.11 Theorem. Let {u,,n = 1,2,...} be a sequence of measures on R* 
such that u, (R) < M < oo for all n, and let h, be the characteristic function 
of un. Let u be a finite measure on R* with characteristic function A. Then 
[Un — u Weakly iff h,(u) — h(u) for every u € R*. 


PROOF. If p, —> u then A, (u) > h(u) by 2.8.1, so assume h,,(u) > h(u) 
for all u € R*. We consider vectors u = te j, where f is real and the coordinate 
vector €; has a 1 in position j and 0’s elsewhere. Then 


h(t) = An (te;) — f, ellXj d.,(x) 
R 
is in fact the Fourier transform of the measure mE on R defined by 


m% (a, b] = un (P7' (a, b]),  a,beR, 


where P; is the projection (x1, ..., xx) — x; on the jth coordinate axis. (Use 
1.6.12 with T = P;, A = R and T'A = R*.) 

Since AP (t) > h(t) = h(te;) as n — œ, it follows from 7.2.8 that the 
mE”, n > 1, are tight measures on R, so we can find a positive number r; 
such that mi” (R —[-r;,r;]) < e for all n. Then up (Rt — Miil- ry, r;]) 
< ke. Consequently, the un, n > 1, are tight measures on R*. The condition 
h,(u) > h(u) assures that every weakly convergent subsequence of {un} 


converges weakly to u. By 7.8.8, Up —> u. O 


The following result, which characterizes weak convergence of k-dimen- 
sional random vectors in terms of weak convergence on R, is a key step in 
the proof of the multivariate central limit theorem. 
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7.8.12 Cramér-Wold Device. Let {X, = (Xn1,...,Xnk),n => 1} be a 
sequence of k-dimensional random vectors. Then the X, converge weakly 
to the random vector Y = (Y),..., Y) 1f and only if 


k k 
Xu; Xn; —> X u,¥; for every u = (Uj, ..., Uz) In R4. (1) 


Proor. Let h, be the characteristic function of X,, and h the characteristic 
function of Y. Then by 7.2.9, condition (1) is equivalent to 


hy (t(u1,..., Ue)) > Atay, ..., Ux)) for all reR and u € R*. 
(2) 


But if (2) holds, then (take t = 1 and apply 7.8.11) X, — > Y. Conversely, 


if X, —> Y then by 7.8.11, h (v) > A(v) for every v € R$, in particular for 
y = tu. [l 


If X,,...,X, are k-dimensional random vectors, with X, = (Xn, -.., X), 
the mean of X, is the vector (m,1,..., Mmk), where m,; = E(X,;). The covari- 
ance of X, is the k by k matrix >|, whose ij entry is Cov(X,;, X,;). If the 
X ; are iid, we may speak of the common mean m = (m),..., mg) and covari- 
ance £. The following result uses some basic properties of the k-dimensional 
Gaussian distribution, which is discussed in Appendix 5. 


7.8.13 A k-Dimensional Central Limit Theorem. Let X|,X>,... be ud 

k-dimensional random vectors with finite mean m and covariance %. If $S, 

— ial X; then Pn 0M converges weakly to Y, where Y has a Gaussian 
n 


distribution with mean O and covariance È. 


Proor. Let Y have a Gaussian distribution with mean 0 and covariance È, 
(There is such a random vector since X is symmetric and nonnegative definite.) 
By the Cramér—Wold device, it is sufficient to show that for every u in R$, 


But the random variables Z; = S~;_, UnX;n are iid with mean 37; _, un m, and 
variance >a Se Uh -pp Un. [Note that Var Z; = Cov(Z;, Z;).] 
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Now, 
j=! Z; —nE(Z)} 


Jn | 
and the one-dimensional central limit theorem says that T, converges weakly 


to a normal distribution with mean O and variance X>% Soy Un Sopp U. 
But this is precisely the distribution of ar Uh Yn. U1 


Ty = 
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CHAPTER 


O 


ERGODIC THEORY 


8.1 INTRODUCTION 

In Chapter 6 we proved the strong law of large numbers: if Xi, X2, 
...,Xn,...18 a sequence of independent, identically distributed random vari- 
ables with finite mean, then 


Sn — E(X,) a.e. 


XX +--+: 4+X, 
E n 
In this chapter we will generalize this result and prove the basic “pointwise 
ergodic theorem.” 

The starting point for ergodic theory is the notion of a transformation that 
preserves the structure of the measure space, as defined below. 


8.1.1 Definition. Let (Q, F, 2) be a measurable space, and T a measurable 
transformation on (Q,.F, p), that is, T: (R, Z ) > Q, F). 

The transformation T is said to be measure-preserving (we also say that T 
is u-preserving or that T preserves u) iff u(T~'A) = p(A) for all A € F. 


(This implies that (T~*A) = (A) for all A € F and all k=1,2,..., 
where T-*A = {w: Tw € A} and T* is the composition of T with itself k 
times.) 

The physical concept of a fiow may be used to motivate the study of 
measure-preserving transformations. A flow may be regarded as a process 
in which a system of particles of a fluid (each point of the container corre- 
sponding to a particle) moves about under the action of an externally applied 
force. The force is assumed to be independent of time, so that, at least at 
discrete times t = 0, 1, 2,..., the flow can be described by a single (measur- 
able) function T. If x is a point of the container, Tx is the position of the 
particle, originally at x, after one second has elapsed; thus T7x = T(Tx) is the 
position after two seconds, and so on. If A is a (Borel) subset of the container, 
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then T~'(A) corresponds to the set of particles that will be in A after one 
application of T. If u is Lebesgue measure (volume in this case) and the fluid 
is incompressible, it is reasonable to expect that w(T~'A) = p(A). 


We consider some mathematical examples. 


8.1.2 Examples. 

1. Permutations. Let Q be a finite set {x,,...,X,},n > 2, with ¥ consisting 
of all subsets of Q. Let T be a cyclic permutation of Q, say, T(x;) = Xi41, 
with indices reduced modulo n. Since T~!{x;} = {x,,}, T preserves u 
iff u{x:} is constant for all i. Thus if u is a probability measure P, then 
P{x;} must be 1/n for all i. 


More generally, if T is any permutation of (2, T can be expressed as a 
product of disjoint cycles C,,..., C. Then T preserves u iff within each 
cycle u assigns equal weight to each point. 


2. Translations. Let Q = R, F= A(R), and let y be Lebesgue measure. 
If T(x) =x +c, c constant, then T preserves u because u is translation- 
invariant. 


3. Rotations of the circle. Let Q be the unit circle in the plane R? (Q can 
be identified with the interval [0, 27r) under the correspondence e? — 6). 
Take Z as the Borel sets, and u = P = A/2z7, where A is arc length on 
the circle (or Lebesgue measure on [0, 27r)). Thus if A is a Borel subset 
of [0, 27), w(A) = f,(22)' dé. 


Let æ be fixed in [0, 27r), and let T be rotation by a. Thus T is defined on Q 
by T(e®) = €+% or equivalently, on [0, 277) by T(@) = 8 +a (modulo 
27). As in Example 2, T preserves u by the translation-invariance of 
Lebesgue measure. | 

4. One-sided shifts. Let Q = R”®, the collection of all sequences 
S = (So, 51,...) of real numbers; take ¥ = [.4(IR)]°, and let y be any 
probability measure P on .¥. Define T(so, 51, 52,...) = (81, 52,...), T iS 
called the one-sided shift transformation. Measures preserved by T are 
stationary in the sense of Definition 8.1.3 below. 

5. Two-sided shifts. Let Q be the set of all doubly infinite sequences 
S = (...,5_1, Sg, 51,...) of real numbers, .¥ the o-field generated by the 
measurable cylinders 


(S: (Sk, Ske15+++5Skin—-1) E By}, 


n = 1,2,..., k =0, +1, +2, ..., B, € @(IR”). Let u be any probability 
measure P on .¥, and let T be the two-sided shift defined by 


T(..., S213580, S1, 0) = (3 S0 S1, S25.) 
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In other words, if the kth coordinate of s is są, the kth coordinate of T(s) 
is 5,,,. As in Example 4, measures preserved by T are stationary in the 
sense of Definition 8.1.3. 


In Example 4, the coordinate variables are defined as follows. If 
w = (So, S1, 52,...), then X; (œ) = sg, k =0,1,.... (A similar definition 
is made in Example 5.) If T is the one-sided shift (or, in Example 5, the 
two-sided shift), we have 


X, (To) = X441(@). 


8.1.3 Definition. Let P be a probability on R”; P is stationary iff 
P{s: (S0, 81,++-,5n-1) © Bh} = Pls: (Sk, Skp ts e+ +s Sken—1) E€ By} 


for all n, k = 1,2,..., and all n-dimensional Borel sets B,,. 
In the case of doubly infinite sequences, k= 1,2,... is replaced by 
k=+1,2+2,.... 


We show that T preserves P iff P is stationary. 
First, note that 
T~*{s: (80, +--+, Sn—1) E€ Bn} = {5} (Sk, 0-05 Skan—1) € Bn}. 
If T preserves P, then 
P(A) = P(T7'A) =... = P(T™*A), AEF, 


and it follows that P is stationary. 

Conversely, if P is stationary and A = {s: (S0, ...,Sn—1) € Bn} is a measur- 
able cylinder, then TT! (A) = {s: (S1,..., Sn) € B,}. The class of sets A € 7 
such that P(A) = P(T~'A) is a monotone class containing the measurable 
cylinders, and hence coincides with X. The result follows. 


8.1.4 Definition. Let (Q,.4,P) be a probability space and Xo, X),..., 


be a sequence of random variables. The sequence Xo, X|,... is a station- 
ary sequence iff for any n =0,1,...and any k = 1,2..., (Xo,..., Xn) and 
(X;,...,Xn4x) have the same distribution. 


In the case of doubly infinite sequences (..., X_1, Xo, X1,-...),k = 1,2,..., 
is replaced by k = +1, +2,.... 


If Xo, X,,...18 a sequence of random variables defined on a probability 
space (Q,.¥, P) we can define on (R™, [.@(IR)]™) a probability yz as follows: 
if B, is a n-dimensional Borel set, we take 


Ln iS: (So, +++, 5n-1) E€ Bh} = Plw: (Xo(@),...,Xn-1(@)) € By}. 
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The probabilities 4, are consistent in the sense of the Kolmogorov extension 
theorem 2.7.5, and define a probability u on (R™, [.@(R)]|~) characterized by 


H{S: (So, .+-,5n—-1) E Ba} = P{w: (Xo@), ..., Xn—1(@)) E Br}. 


It is easy to verify that the sequence Xo, X|,...1s stationary iff the probability 
u is a Stationary probability on R”, 

Note that the transformations of Examples 1, 2, 3, and 5 are invertible 
(measurable, one-to-one, onto, with T~! measurable), while that of Example 4 
is not invertible (it is not one-to-one). 

We now consider a physical example to motivate the concept of an ergodic 
transformation. Suppose that rainfall data are collected at a very large number 
of observation points dp, 41, ...at times t = 0,1,.... Assume that the stastis- 
tical character of the observations at a; 1s the same for all i. The observation 
at a; is represented by a stationary sequence of random variables Xo, X;1,..., 
with X;„ the amount of rainfall at time n at a;. Assume also that the a; are 
“independent,” in other words, the a; correspond to a sequence of indepen- 
dent performances of a random experiment, where a performance means an 
observation of the entire sequence (Xio, X;),...). 

Suppose that the problem is to measure the average rainfall. Scientist A 
might take the following approach. He or she might take measurements at 
each observation point at a given time, say t = 0, and average the results. 
Scientist B might reason as follows. Since all observation points have the 
same stastistical character, we can simply go to one observation point, take 
a large number of observations, say at £ = 0,1,...,n — 1, and average the 
results. Scientist A is using what might be called a vertical measuring scheme, 
and Scientist B a horizontal scheme, as illustrated in the table; A’s observations 
correspond to the first column, B’s to the first row. 


Observation poin | r20 | 1 | 2 | 


ao X00 Xol X02 
a X10 Xıl X12 
ay X29 X41 X43 


Now A and B will not necessarily obtain the same result (not even “essen- 
tially” the same). For example, suppose that “‘nature” flips an unbiased coin 
at each observation point. If the coin comes up heads, the rain is one inch at 
each observation time; if the coin comes up tails, there is no rain at any time. 
Roughly half of A’s observers will measure one inch of rainfall, and half will 
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measure none. Thus A will arrive at an average rainfall of one-half inch. But 
B will either measure an average of one inch or no rain at all, and thus will 
not get the same answer. 

Mathematically, B is computing a time average, namely (if we denote Xog 
by X;), 


1 n- | 
— XOX (œ) for a particular œ. 
n 

k=0 


(Note also that if T is the one-sided shift, then X; (@) = Xo(T*w).) 

But A is observing an ensemble average at a particular time, namely, 
(1/n) $to Y;, where the Y; = Xj are independent random variables, all hav- 
ing the same distribution as Yọ = Xo. Thus A’s result would approximate 
E(Xo) = fo XodP. For A and B to get the same answer, we must have 


1 n—l 
- X X(T*w) > J Xo dP. 
n o S2 


More generally, we might ask when it will be true that, for each integrable 
function f on (Q, Z, P), we have 


n—l 
SEKT) ff aP (1) 
M0 S2 


at least for almost every æ. In particular, if f is an indicator I4, the property 
to be verified is simply the convergence of the relative frequency of visits to 
A in the first n steps to the probability of A. 

Now suppose that A is an “almost invariant” set. In other words, A and 
T~'A differ only by a set of measure 0. (In the case where nature flips an 
unbiased coin to determine rainfall, we may take A = {w: Xo(w) = 1}, so that 
T~'A = {w: X,(w) = 1}.) Then we have, almost everywhere, 


1 n—-| 
-X 1,(T*o) =I4@@) _ for all n. 
n k=0 


Thus the relative frequency of visits to A cannot converge to P(A), except 
when P(A) = 0 or 1. Conversely, the pointwise ergodic theorem, to be proved 
in 8.3, implies that if every almost invariant set has probability O or 1, the 
convergence result in (1) holds. In the next section we shall prepare for the 
proof of this basic result. 

One comment on terminology. In this chapter, we deal exclusively with real 
as opposed to complex-valued functions. 


350 8 ERGODIC THEORY 


8.2  ERGODICITY AND MIXING 
The following definitions are motivated by the analysis at the end of 8.1. 


8.2.1 Definition. Let T be a measure-preserving transformation on 
(Q, F, u). A set A € F is said to be invariant (under T) iff A = T~'A, that is, 
w € A iff Tw € A; almost invariant iff A and T~'A differ by a set of measure 
0, in other words, w(A A T~!A) = 0. 

It is easily checked that the invariant sets form a o-field, as do the almost 
invariant sets. 


If g: (2,4) — (R, @(R)), g is said to be invariant iff g(Tw) = g(œ@) for 
all œw, almost invariant iff g(Tw) = g(w) for almost all œ. Note that a set is 
invariant (respectively almost invariant) iff its indicator is invariant (almost 
invariant). 

The measure-preserving transformation T is said to be ergodic iff for every 
invariant set A, either (A) = 0 or (Q — A) = Q. In the case of a probability 
space, ergodicity means that each invariant set has probability 0 or 1. 

Invariance may be replaced by almost invariance in the definition of ergod- 
icity, as the following result shows. 


8.2.2 Lemma. Let T be a measure-preserving transformation. 


(a) If A is an almost invariant set, there is a (strictly) invariant set B such 
that u(A AB) = 0. 

(b) A measure-preserving transformation T is ergodic iff for each almost 
invariant set A, either w(A) = O or u(Q — A) = 0. 


Proof. Take B = limsup, T”A. Then T~'B=limsup, TT”+DA = B, 
hence B is invariant. 

Now A AB c Uo (TAA T&A); for if w € A — B then w € T~"A for 
only finitely many n, including n = 0. Thus œ € T~*A — T~®*DA for some 
k. If œw € B — A, then T”œ € A for infinitely many n, but œ ¢ A. If k+ 1 is 
the smallest integer such that T*+!w € A, then œ e TEDA — T-*A. 

Since u(T~*A A T~ &+DA) = u (A A T'A) = 0, it follows that u(A A B) = 
0, proving (a). 

To prove (b), let T be ergodic, and let A be almost invariant. If B is invari- 
ant and u(A AB) = 0, then u (A) = u (B) and w(Q — A) = u(Q — By; hence 
u(A) = 0 or (82 — A) = 0. The converse is clear since every invariant set is 
almost invariant. C 


We give another way of expressing ergodicity. 
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8.2.3 Lemma. Let T be a measure-preserving transformation on (Q, F, u). 
The following conditions are equivalent: 

(a) T is ergodic. 

(b) Every almost invariant function is a.e. constant. 

(c) Every invariant function is a.e. constant. 


Proor. (a) implies (b): Let g be an almost invariant function. Then for each 
real A, A, = {w: g(@) < A} is an almost invariant set since g(w) = 2(Tw) a.e. 
By (a) and 8.2.2, w(A,) = 0 or w(AS) = 0. Let c = sup{A: w(A,) = 0}; c is 
finite (ignoring the trivial case u = 0) since A, f Q as A f oe, and AS f Q as 
à | —oo. Then 


oS ] 
Hæ: gl@)<c}=pu U fo g(w) < c — 3 = 0, 
H? 


n=l 


and similarly w{w: g(w) > c} = 0. Thus g =c a.e. 

(b) implies (c). Every invariant function is almost invariant. 

(c) implies (a): If A is an invariant set, 7, is an invariant function, hence 
I, is a.e. constant. If I, = 0 a.e. then w(A) = fa I4 du = 0, and if I4 = 1 ae. 
then (AC) = fa — Ia)du =0. C 


Invariance and almost invariance can be defined in the same way for 
extended real-valued Borel measurable functions. Lemma 8.2.3 holds in this 
case also, with essentially the same proof. 

The following characterization of almost invariance is often useful. 


8.2.4 Lemma. Let T be a measure-preserving transformation. Assume u 
is finite. A set A €.F is almost invariant iff either u(T~'A — A) = 0 or 
u(A —T~'A) =0, that is, Tw €A essentially implies œ € A, or WEA 
essentially implies Tw € A. 


Proof. We may write 
w(A — T'A) = (A) — w(ANT™'A) = u(T7'A) — (A N T'A) 
= u(T A — A). 


Thus u(A A T'A) = 2u (A — T'A) = 2u (T'A — A). O 


Nonergodicity, that is, the existence of a nontrivial invariant set, indicates that 
T does not completely stir up the space. This concept of “stirring” may be 
developed as follows. 
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8.2.5 Definition. If T is a measure-preserving transformation on the 
probability space (Q, Z, P), T is said to be mixing iff for all A,B € F, 


lim P(A N T™”B) = P(A)P(B). 


The restriction to a probability measure is essential here. If, for example, 
T has the mixing property with respect to the measure p, let A = B = Q to 
obtain 4(Q) = [u(Q)]}. Thus if w(Q) is finite it must be 1; if ~(Q) = co, 
and A is a Set in .¥ with finite, strictly positive measure, take B = Q. Then 
L(A NT "B) = pA) < œ, but u(A)u(B) = œ, a contradiction. 


The mixing property has the following intuitive interpretation [this example 
is due to Halmos (1956)]. Regard the transformation T as defining a flow, 
as in the discussion after 8.1.1. Suppose, for example, that initially the 
container is filled with a liquid that is 90% gin, 10% vermouth, the “vermouth 
particles” occupying the set A, the “gin particles” the set Q — A. The externally 
applied force is due to a swizzle stick. The condition of the container 1s 
observed at times ¢ = 0,1,2,.... If B is any Borel subset of the container, 
let P(B) be the volume of B divided by the volume of the container (so that 
P(A) = 0.1). It is reasonable to expect that if the mixing process is continued 
long enough, the percentage of vermouth in B should be approximately the 
same as the percentage in the entire container, namely, 10%. To translate 
this into mathematical terms note that if œ is a point of the container, and a 
particle is initially at œ, then T”æœ is the position of the particle n seconds 
later. Thus the set of vermouth particles that are in B at time t=n is 
{wo €A: T’w € B} =ANT "B. The fraction of vermouth in B at time t =n 
is PANT "B)/P(B), and the mixing property is expressed by saying that 
P(A NT "B)/P(B) —> P(A) = 0.1. 

Mixing is a stronger property than ergodicity, as we now prove. 


8.2.6 Theorem. Let T be a mixing transformation on (Q2, 7, P). Then T is 
ergodic. 


Proof. Let B be an invariant set. If A € Z, then since B = T~-"B, we have 
P(A O B) = P(ANT"B) for all n. If we let n —> œo and invoke the mixing 
property, we obtain P(A N B) = P(A)P(B). But since A is an arbirary set € ZF, 
we may take A = B, and thus P(B) = [P(B)]’, hence P(B) =O or 1. O 


It is useful to observe that it is not necessary to verify the mixing condition 
for all the sets A, B € Z, but only for A, B ina field o whose minimal o-field 
is F, 


8.2.7 Theorem. Let T be a measure-preserving transformation on (82, .¥, P). 
Let ¥o be a field of subsets of Q such that the o-field generated by So is ¥. 
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If the mixing condition holds for all A, B € .%o, it holds for all A, B € Z and 
hence T is mixing. 


Proor. Let A,B e.¥, and find sets A}, B; € Ap (kK = 1,2,...) such that 
P(A AA;) and P(BAB,;) > 0 as k > œ (see 1.3.11). Now 


(ANT~"B) A (Ag T~"B,) C (AA Ax) U (T™” (BA B,)), 


so the probability of the set on the left is at most P(A AA;)+ P(BAB;), 
which approaches 0 as k — oo, uniformly in n. Thus P(A; 1 T "B;) 
—> P(ANT "B) ask — ow, uniformly in n. By the hypothesis, P(A; A T~"B,) 
— P(A;)P(B;) as n — oo. Therefore by the standard double limit theorem, 


lim P(ANT “B)= lim lim P(A, OT ”B;) 


= lim lim P(A, N T” B) = P(A)P(B), 


k— 00 n> 


the desired result. L 


8.2.8 Examples. We consider again the examples of 8.1.2. 


1. Permutations. Let T be a permutation of Q = {x,,...,x,},n > 2, u any 
measure on all subsets of Q that assigns equal weight to each point within 
a given cycle of T. (Assume that u{x;} > 0 for all i; when talking about 
the mixing property we also assume u is a probability measure.) 


We claim that T 1s ergodic iff T has only one cycle; this follows because 
the only invariant sets are unions of cycles of T (and the empty set). 


But T is never mixing. Suppose that {x,,...,x,} is a cycle of T and 
Tx; = xj41, with indices reduced modulo k. Assume k > 2; if k = 1, {xı} 
is a nontrivial invariant set. Let A = B = {x;}. Then AN T~-"B coincides 
with A if n is a multiple of k, and is the empty set otherwise. Thus 
lim, H(A NT~"B) does not exist. 


2. Translations. Let T(x)=x-+c on R, with Borel sets and Lebesgue 
measure. Then T is not ergodic, A = |); _. (nc, nc + c/2) is a nontrivial 
invariant set. 

3. Rotations of the circle. Let T be rotation by œ on the unit circle (or 
T(@) = @+a (mod 277) on [0, 277)). We claim that T is ergodic iff a/27 
is irrational, that is, iff e'* is not a root of unity. 

Assume @/27 irrational, and let A be an invariant set in Z. Let a, be the 
nth Fourier coefficient of the indicator function I4, that is, 


1 20 . . 
„n = — I —inw — —inw 
a > J alæœ)e dw J e dP (œ) 
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By 1.6.12, 
Ap =j g`in @ta) dP(w) — e ina J Paa dP(w) — eine ag 
TA A 


by invariance of A. Since a/27 is irrational, e~'"“ Æ 1 for n Æ 0, and 
it follows that a, = 0 for n 40. But J, € L’, so that the Fourier series 
YO ng ne” converges in L’ to I4(@) (see 3.2, Problem 9). It follows 
that 74 = ap a.e., and therefore P(A) = O or 1. Thus T is ergodic. 
Conversely, if e’” is a root of unity, say, e” = 1, then æ is an integral 
multiple of 27/n. Let A be the union of the sectors 0 < 0 < z/n, 
2a/n <@<3n/n, 4n/n <0 < 5r/n,..., (Qn —2)n/n <@ < (Qn — 1) 
r/n. Then A is invariant, but P(A) = 5> so that T is not ergodic. 
Now T is never mixing; to establish this we may assume that œ/27r 
is irrational. Let A = B = {@: 0 < 6 < m}, corresponding to the upper 
semicircle. Given £ > 0, e'”* is within distance £ of e® = 1 for infinitely 
many n. (Extract a convergent subsequence {zz} from {e’“}, and select 
zi and z;,; such that dist(z;, z;,;) < £; 1 can be chosen larger than any 
preassigned positive integer. Then it is possible to form a chain that 
eventually goes entirely around the circle, with the distance between 
successive points less than ¢.) It follows that A and T”A overlap except 
for a set of measure less than £. Thus 


P(ANT "B)=P(ANT “A)=P(T"ANA) 
since T is measure-preserving and invertible 


> P(A)-e=45-e>3 if E<. 


But P(A)P(B) = [P (4) } = F, so the mixing property fails. 

4. One-sided and two-sided stationary processes. Let T be the one-sided 
(or two-sided) shift transformation on the space of all infinite (or doubly 
infinite) sequences of real numbers. We consider only the case in which 
the coordinate random variables X; are independent, that is, the measure 
P has the property that 


Plo: X;(w) € Aj, i=1,2,...,n} =|] Plo: Xi(@) € Aj} 
i=l 


for all real Borel sets A;...,A, and all n = 1,2,.... In this case, T is 
mixing (hence ergodic). Let 


A = {øw: (Xo(@), .. -, Xz-1(@)) € By} 
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and 
B = {w: (Xo(o), ...,X,-1(@)) € B,'} 


be measurable cylinders. For sufficiently large n we have n > k — 1, and 
therefore the indices defining the sets A and 7 ~"B are distinct. Thus by 
independence, 


P(A () T~"B) = Plo: (WO, -3 waz—ı) € Bln, san, Wnir—1) € B,9 
= P(A)P(T~"B) = P(A)P(B). 


Thus the mixing condition holds for all measurable cylinders, and hence 
by 8.2.7, the mixing condition holds for all A, B € F, so that T is mixing. 


Problems 

1. IfA is an almost invariant set, show that for almost every œ, œ € A iff 
T"w € A for all n = 1,2,.... 

2. If uis counting measure on the integers, show that œ > w+ 1 is ergodic, 
but @ > w+ 2 is not. 

3. Let T be a measure-preserving transformation on (Q,.¥, u). Give 
examples to show that the following results are possible: 
(a) A €.¥ does not imply T(A) E€ F. 
b IfA €¥ and T(A) € ¥, P(A) need not equal P(TA). 
(c) If T is one-to-one onto, it need not be invertible (that is, T7! need 

not be measurable). 
4. (Jacobs, 1962) Let T be a measurable (but not necessarily measure- 


preserving) transformation on (Q, Z, u). We say that T 1s recurrent iff 
for every A € F and almost every œw € A, T"w €A for some n > 1; 
T is infinitely recurrent ff for every A € Z and almost every w €A, 
T"w €A for infinitely many n. (In these definitions, the exceptional sets 
of measure 0 are allowed to depend on A.) A set B € ¥ is wandering 
iff B, T~'B, T-*B,... are disjoint; T is said to be conservative iff all 
wandering sets have measure 0. Finally, T is incompressible iff A c T~'A 
implies u(T!A — A) = 0. (Show that equivalently, T~'A C A implies 
u(A —T~!A) =0.) 
(a) Show that the following are equivalent: 
i. T is incompressible; 

ii. T is conservative; 

iii. T is recurrent: 

iv. T is infinitely recurrent. 


(One possible scheme is (i) > (ii) => (iii) > (i), (iv) => (iid), 
(i) = (iv).) 
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(b) Show that T(x)=x+1 on R (with Borel sets and Lebesgue 
measure) violates all four conditions of (a). 


(c) (Poincaré) If T is measure-preserving and p is finite, show that T is 
(infinitely) recurrent. 


8.3 Tue Porntwisz ERGODIC THEOREM 

We will prove the pointwise ergodic theorem, which states that if T is a 
measure-preserving transformation on (Q, Z, p) and f € L'(Q,.¥, yw), then 
n-!(f(w) + f(Two) +--+ + f(T” !w)) converges to an integrable function 
f (œ), for almost all a. 

It is possible to prove this result in a somewhat more general form (see 
the comments at the end of the chapter). The generalization is based on the 
fact that associated with a measure-preserving transformation is a positive 
contraction operator, as follows. 


8.3.1 Theorem. If f is an extended real-valued Borel measurable function 
on (Q2,.¥, u), and T is u- -preserving, let T f denote the function f°T. Then for 
every p € (0, œo] we have IT fll, = If Ilp. Thus if we consider T as a linear 
operator on the Banach space LP (R, ¥, u), 1 < p < œ, then T is an isometry 
(a one-to-one, linear, norm-preserving map), in particular, T is a contraction, 
that is, IÎI < 1 (in this case, IŽ = 1). Furthermore, T is positive, that 1s, if 
f > 0 a.e., then Tf > 0 ae. 


Proor. By 1.6.12, fo|f(T)l? du(w) = fa Fo) du(w), hence ||7 fI, 
= ||fllp, 0 < p < œ. If p = œœ, we have 


IT fllo = inf{c: w{|T f| > c} = 0} 
=inf{c: p{w: | f(Tw)| > c} = 0} 
= inf{c: uT æ: | f(w)| > c} = 0} 
= inf{c: p{|f| > c} = 0} since T preserves uu 
= || f lloo. 
If u(N) = 0, and f > Oon N°, then w(T—'!N) =Oand Î f >Oon(T7'NY, 
proving positivity. [l 


The sequence of averages f™ (w) = n7'[f(w) + AT) + -++ f(T" 'o)] 
can now be expressed in terms of T as 
fO =n fF HPS +T S) 


where T° f = f and T* is the composition of 7 with itself k times, k > 1. 
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In the three results to follow, T is a measure-preserving transformation 
on (Q, F, u), and f: (Q, Z )—> (R, @(R). Inspection of the proofs will 
show, however, that the results hold if Î is replaced by an arbitrary 
positive contraction on L', not necessarily arising from a measure-preserving 
transformation. 


8.3.2 Lemma. If f € L' (Q, F.u), let fo = f, and 


fa =max(f, f +Îf,..., f +Îf+ +Ê f), n >l. 


Then A 
fni ES HTSS, n=Q,1,.... 


Proor. If O<m<n,_ then mtl T f=f+ T (Eio if), Since 
S0 T* f < fn, and T is positive, we have 
T ($r) <T hf, <Tft. 
k=0 


Thus 
m- | 


SoM f<f+Tfi, O<m<n. 
k=0 


Since f < f + Tft, we have frit < f + T ft, as desired. U1 


8.3.3 Lemma. Let f, be defined as in 8.3.2, and assume f € L'(Q,.F, m). 
If A, = { fn > O}, then fa, fdu > 0. 


Proor. By 8.3.2, 


f pane f fidu- f PFE rd 
Ån Ay Ån 
> | fada- | ÖSP du 
An An 
= | ftda- | (T fi) du 
C2 An 
> | fidu- [Ppp ae 
n $ 


= Ift llh — TFT > 0 


since T is a contraction. O 
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8.3.4 Maximal Ergodic Theorem. If f € L! (Q, F, p) and 


n—| 
A= o sup (T* f \(w) > | 
Zi k=0 


n l ga 


n> | 


n—| 
= fo sup f ™ (w) > | where fO =n ` T" f 


then f, f du > 0. 
Proor. The sets A, of 8.3.3 increase to A. O 


It will be convenient to isolate some of the technical difficulties in the 
proof of the pointwise ergodic theorem. Let f € L'(Q, Z, u), and let f(w) 
= pn! S f(T*w), n = 1,2,..., where T is -preserving. If a < b, define 


Cab = Calf) = 0 liminf f° (@) <a < b < lim sup Fo) , 


n — OO 


N, = fo sup f (w) > a 


n> | 


note that C,» 1s a subset of N}. 

Just as in 6.4.3 we can establish a.e. convergence of the sequence f™ if 
we show that u(Cap) is always 0. To do this, we may assume without loss of 
generality that b > 0. Note that 


Ca(f) = {@: liminf—f (w) < —b < —a < limsup—f?(w)} 
— C_»,-a(—f). 


If b < 0, then —a > 0, and the argument below will show that C_,_,(—f) 
has measure 0, and thus u(Cap) = 0. 


8.3.5 Lemma. The set Cap has the following properties: 


(a) The set Cap 1s almost invariant. 


(b) (Cap) < OO. 
(c) In fact w(C,,) = 0. 
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Proof. (a) We may write 


n—| 
f(T) =~ fT eo 
n k=0 


on+li| 1 «pa fo) 
—— H Ero] - - 
-TEL peno j) — no, 


Since f € L!, f(w) is finite for almost all œw, hence 
lim inf f (w) = lim inf f (To), 
and similarly for lim sup, except possibly on a set of measure 0. Thus, outside 
a set of measure 0, œw € Cap iff Tw E Cap, and the result follows. 
(b) Let C be any set in ¥ such that C C Cap and u (C) < ov, and define 


Fp = {w: sup,.)(f — bIc)™ (æ) > 0}. Note that if œ € Cap, then w € Np, 
hence 


jn- 
— ` f(T*w) > b for some n 
n 


Thus œ € Fp; in particular, C is a subset of Fp. By the maximal ergodic 
theorem 8.3.4, 


(f — bic) dp = 0. 
Fy 

Thus 

[pia = | fdu>b | Ic du = bu(C N Fy) = bu (C) 

Q Fp Fp 
so that 

wc) <b"! | fldu < o0. 
Q 


Now note that C,» 1s a subset of 


XS 


(Jlo: |f(T"@)| > 0}: 


n=0 
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By Theorem 1.6.12, 


J AT" o)| dulo) = [ fl) dula) < 00, 


and it follows that {w: | f(T”"q@)| > 0} is a countable union of sets of finite 
measure, and therefore so is Cy» (see 2.2, Problem 2). Thus 


(Cap) = sup{u(C): C € ¥,C C Cap, UCC) < 00} 


<b" | fidu < 00. 
P 


(c) Since Cap is almost invariant, T is a well-defined measure-preserving 
transformation on (Cap, Fab, Lap), Where 


Fay ={A EF: AC Cap} = {BNCg: BEF }, 


and [gp = H restricted to Cap. (Strictly speaking, if D = Cap A T~'C.», then 
T is well defined on Cap — D. Since w(D) = 0, this causes no difficulty. For 
example, we may redefine T as the identity on D, and then it will be well 
defined and measure-preserving on Cap.) 

The argument of part (b) may now be applied to T on Cap. In particular, 
the equation f r,(f — blc) du = 0 now becomes 


| (f — bic) du > 0. 
Cap NF 


Since Cap has finite measure, we may set C = Cap, and since Cap C Fp, we 
obtain 


(f —b)dp = 0. 
Cab 


Now let 
Fap = fo E€ Cap: sup(a — f)(@) > of , 
n> | 


If w € Cap, then f\(w) <a for at least one n, hence w € F’, thus Cap 
= Fp. Since Cap has finite measure, constant functions are integrable on 
Cap, and we may therefore apply the maximal ergodic theorem to obtain 


(a— f)du > 0. 


Cab 
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Thus we have, for a < b, 
bu(Cap) < J f du < ap(Ca), 
ab 


a contradiction unless #(Cap) = 0. O 

We may now prove the main result. 
8.3.6 Pointwise Ergodic Theorem. Let T be a measure-preserving 
transformation on (Q, Z, u), and let f € L' (Q, F, u). Then there is 
a function f € L! such that we have n`! ar f(T*w) > f(w) almost 
everywhere. 
Proor. Let D= {w: f(@) does not converge to a finite or infinite limit}. 
Then D =| J{Ca(f): a < b,a, b rational}. By 8.3.5, ~(D) = 0, and hence 


fe) converges for almost all w; we call the limit f (w). (Define f = 0 on 
the exceptional set.) By Fatou’s lemma, 


J (Îl du = J lim |f" (%)|du (o) < liminf J fo) dulo) 
O Q R90 no 00 O 


But 


j 2o 
™) du < — J T old 
fis | o> Todu) 


| n—l 
--y | \fldu by 1.6.12 
n yng 22 


= J |f| du < oo, 
Q 
and the theorem is proved. LI 
We now look more closely at the convergence of the sequence { f‘"?}. 


8.3.7 Theorem. If (9) < oo and f € L?(1 < p < oœ), then f € L? and 


fm E, f 
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By Theorem 1.6.12, 


J fT"o)| dulo) = J Fæ) du(w) < 00, 


and it follows that {w: |f(T”œ)| > 0} 1s a countable union of sets of finite 
measure, and therefore so is Cap (see 2.2, Problem 2). Thus 


U(Cap) = supfu (C): CEF, C C Cap, UCC) < 00} 
<b | fldu < o0. 
Q 


(c) Since Cap is almost invariant, T is a well-defined measure-preserving 
transformation on (Cab, Fab, Lap), Where 


Fp SHEF: AC Ca} ={BNCg: BEF}, 


and Hap = H restricted to Cap. (Strictly speaking, if D = Cap AT~!Cap, then 
T is well defined on Cap — D. Since (D) = 0, this causes no difficulty. For 
example, we may redefine 7 as the identity on D, and then it will be well 
defined and measure-preserving on Cap.) 

The argument of part (b) may now be applied to T on Cap. In particular, 
the equation f, (f — bIc) du = 0 now becomes 


| (f — blc) du = 0. 
Cap Fy 


Since Cap has finite measure, we may set C = Cap, and since Cap C Fp, we 
obtain 
(f —b)dp > 0. 
Cab 


Now let 


Fap = fo E Cab: sup(a — fF) (a) > | , 


n>l 


If w € Ca, then f™(&w) <a for at least one n, hence w € F’,; thus Cap 
= Fp. Since C,, has finite measure, constant functions are integrable on 
Cap, and we may therefore apply the maximal ergodic theorem to obtain 


(a— f)du > 0. 
Cub 
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Thus we have, for a < b, 
bu(Ca) < J f du < ap(Ca). 
ab 


a contradiction unless (Cap) = 0. O 

We may now prove the main result. 
8.3.6 Pointwise Ergodic Theorem. Let T be a measure-preserving 
transformation on (Q, FZ, um), and let f € L! (Q, 7, u). Then there is 
a function f € L! such that we have niy f(T*w) > f(@) almost 
everywhere. 
Proor, Let D = {w: f™ (æ) does not converge to a finite or infinite limit}. 
Then D =\|J{Cap(f): a < b, a, b rational}. By 8.3.5, (D) = 0, and hence 


f'(@) converges for almost all w, we call the limit f (œw). (Define f = 0 on 
the exceptional set.) By Fatou’s lemma, 


[fia = | lim Fo) dulo) < liminf | |f ™ (@)| du (w). 
O Q noo fi — OO O 
t 


Bu 


l n—l 
(n) — k 
[us |du < =, IAT o)l du(w) 


| n—| 
— -5 Ifldu by 1.6.12 
k=0 v 


= J |f| du < œ, 
2 
and the theorem is proved. O 
We now look more closely at the convergence of the sequence {f ™}. 


8.3.7 Theorem. If u(Q) < oo and f e L?(1 < p < oo), then f € LP and 


IP a 
fO f. 
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Proor. Since the finite-valued simple functions are dense in L?, for each 
€ > 0 there is a bounded measurable function g such that || f — g||, < £. If 
filo) = f(T*w), giw) = g(T*w), and |g| < M, then 


| n—| 
GS) 
k=0 


Since |n7! ar 2.| < M (hence |f| < M a.e.), the second term on the right 
in (l) approaches zero as n approaches oo, by the dominated convergence 
theorem. (The hypothesis that 44(§82) < co implies that the function constant at 
M is integrable.) By 1.6.12, || fi — gellp = If — gllp < £, hence the first term 
is less than £. Now 


+|le—fllp. (1) 


Pp 


+ 


f™ — flip < — gg) 


n—|l 
[fanaa = f tim) ORN du 
n— | P 
< liminf 2 — gx)| du 
id 
= limit — — g,) og. 
n n 


Thus || f — f |p < 2e for large enough n, and the result follows. LI 


If p= 1 in 8.3.7, the hypothesis that w(Q) < co cannot be dropped 
(Problem 1). Also, the result fails for p = co, even if 4(92) < oo (Problem 2). 

We can now identify the limit function f . Theorem 8.3.9 indicates that 
although the pointwise ergodic theorem can be presented without reference to 
probability, some insight is lost in doing so. 


8.3.8 Lemma. If f € L', then f is almost invariant. Thus if & is the o-field 
of almost invariant sets, then f: (Q,¥) > (R, @(R)). 


Proor. If fw) > f(w) for o ¢ N, where w(N) =0, then f (To) 
— f(T) for w ¢ T~'N, where w(T~'N) = 0. But [see the proof of 8.3.5(a)] 


f(To) = on -) FW) — Ko) 


and f(w)/n — Oa.e. since f € L!. Thus f™(Tw) > f (w) a.e. hence f (@) = 
f (Tw) a.€., proving f almost invariant. 
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If Be ZR) and C = {w: f(w) € B}, then if f@) = f (Tw) we have 
œw EC iff Toa eC. Thus C is almost invariant, and the proof is com- 
plete. L 


8.3.9 Theorem. If f € L', and A is an almost invariant set of finite measure, 
then f, f du = f, f du. Thus in a probability space, f = E(f |F ), where ¥ 
is the o-field of almost invariant sets. 


Proor. Restrict T and pz to the almost invariant set A. Since (A) < oo, we 
have L! convergence (on A) by 8.3.7, hence f, f" du > f, f du. But 


[rae | tae by 1.6.12. C 
A A 


In the ergodic case, f assumes a very special form. 


8.3.10 Theorem. If T is ergodic and f € L', then f is constant a.e. If 
y(Q2) = oo, the constant is c = 0; if (QQ) < oo, we have 


l 
= —_ du. 
° ae bf 


Thus on a probability space, f = E(f) ae. 


Proor. By 8.3.8, f is almost invariant, so by 8.2.3(b), f =c ae. If 
j4(Q) = œ, then c must be 0 because f € L! by 8.3.6. If 4(Q) < co, then 
c= [uQ] Ja f du by 8.3.9 (with A = 2). O 


If T is ergodic and y is finite, consider the case f = I4. Then by 8.3.10, 
f= u(A)/u(Q) a.e., so that n~! $to Ia(T*o), the relative frequency of 
visits to A, converges a.e. to the relative mass of A (the probability of A 
if a(R) = 1). Apparently we have a version of the strong law of large 
numbers, and in fact the pointwise ergodic theorem can be regarded as a 
generalization of this result. Let T be the one-sided shift transformation (see 
8.1.2 Example 4 and 8.2.8 Example 4), with coordinate random variables X,. 
If Z is an integrable random variable on (Q,.¥, P), then 


Zw) =n '[Z(ap, w, -.-) + Zlo, Oo, --.) + + Zl, Ons -- 1s 
in other words, 


Z™ = n-'[Z(Xo, X1,.--) + Z(K, Xa...) He + Z(Xp_1, Xn). 
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By 8.3.9, 
Z") S E(Z| Z) 


where & is the o-field of almost invariant sets; if T is ergodic, the limit 1s 
E(Z) by 8.3.10. In particular, let Xo, X1, ... be iid random variables with finite 
expectation. If Z(w) = wo, that is, Z = Xo, we obtain 


n~'(Xo+---+Xn-1) > E(X) ae., 


the iid case of the strong law of large numbers (see 6.2.5). 

In the next section, it will be necessary to consider probability measures 
Pı and Pz that are each preserved by a fixed measurable transformation T 
on (92,.% ); P; is said to be ergodic (relative to T) iff T is ergodic on 
(2, 4, P;). If Pı and P3 are both ergodic, they must be identical or mutually 
singular. 


8.3.11 Theorem. If P; and P, are ergodic probability measures relative to 
T, then either P; = P> or Pı L Po. 


Proor. Suppose that Pı (A) # P2(A) for some A € F, and let 
A; = fœ: IP (@) > P;(A)}. i= 1,2. 


By 8.3.6 and 8.3.10, P;(A;) = P2(A2) = 1. But A, and A, are disjoint since 
P\(A) Æ P(A), hence Pı L P2. O 


Theorem 8.3.11 gives us a criterion for ergodicity (unfortunately. im- 
practical). 


8.3.12 Theorem. The probability measure P is ergodic relative to T iff 
there is no probability measure P; preserved by T such that Pı is absolutely 
continuous with respect to P but not identical to P. 


Proor. If Pi << P, Pi #P and P is ergodic, then so is Pı. If A is an 
invariant set, then P(A) = 0 or P(A‘) = O by ergodicity, hence P(A) = 0 or 
P (4°) = 0 by absolute continuity. By 8.3.11, Pı L P. But then P, is both 
absolutely continuous and singular with respect to P, hence P, is the zero 
measure, a contradiction. 

Conversely, if P is not ergodic, let A be a T-invariant set with O < P(A) < 1. 
Define P|(B) = P(B | A) = P(A A B)/P(A), B € F; then Pı << P, and since 
P(A) = 14 P(A), Pi # P. Now 


$.3 THE POINTWISE ERGODIC THEOREM 365 


P(T E QA) 
P(A) 
P(T'E QT'A) 
7 P(A) 
P(E MA) 
~ PA) 
= P (E). 


P (T'E) = 
since A 1S invariant 


since T preserves P 


Thus P; is preserved by T. O 


Problems 


l. 


Let T(@) = œw + 1 on R (with Borel sets and Lebesgue measure). If f is 
the indicator of (0, 1], show that f) does not converge in L! to f. 


Let 2 = R”, ¥ =[.4(R)1™, X, (Wo, @1,.-.) = @,, P the unique prob- 
ability measure making the X, independent with P{X, = 0} = P{X, = 1} 
= 7 (In other words, consider an infinite sequence of Bernoulli trials 
with probability 5 of success on a given trial.) If T is the one-sided shift 


and f(w) = wo, show that n~! ar (T*w) does not converge in L” to 


Let T be a measure-preserving transformation on (Q, 7, u). If T is 
ergodic, we know that for every f € L', f™ converges a.e. to a constant. 
Conversely, if u(Q) < oo and if for every f € L! there is a constant 
c =c(f) such that f” — c a.e., show that T is ergodic. Give a counter 
example to this statement if (92) = oo. 


Let T be an ergodic measure-preserving transformation on (Q,.¥, p) 
with (922) < oo. Let f be a real-valued Borel measurable function such 
that fẹ f du exists. If f € L', we know that f™ converges a.e. to 
[u(Q)]! fo f du. Conversely, if f converges a.e. to a finite limit, 
show that f € L'. (A special case of this result was considered in 6.2, 
Problem |. Note also that the result fails when .((2) = œo; take f = c.) 


(Mean Ergodic Theorem in a Hilbert Space) Let U be a bounded linear 
operator on the Hilbert space H, and let U* be the adjoint of U, defined by 
the requirement that (Uf, 2) = (f, U*g) for all f,2 € H.(f > (Uf, 2) 
is a continuous linear functional on H, so by 3.3.4(a) there is a unique 
element h € H such that (Uf, ¢) = (f,h) for every f € H; h depends 
on g and we may write h as U*g. It follows from the basic properties of 
the inner product that U* is a bounded linear operator on H.) 


Establish the following results. 


366 


(a) 


(b) 


(c) 
(d) 


(e) 


(£) 


(g) 


(h) 
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The following conditions are equivalent (and define a unitary 
operator, that is, an invertible isometry). 
i. UU* = U*U =], the identity operator on H. 
ii. U is one-to-one onto, and (f, g} = (Uf, Ug) for all f,g €H. 
iii. U is one-to-one onto, and |U f|| = || f|| for all f € H. 
The following conditions are equivalent (and define an isometry). 
i. U*U =I. 
ii. (f,g) = (Uf, Ug) forall f,geH. 
iii. ||Uf || = || f|| for all f €H. 
For the remainder of the problem, U is an isometry of H. 
If f € H, then Uf = f iff U* f = f. 
Define A, =n '7+U+U7+---+U"~'); note that 


n—l 
lAnll <n! > Uy = 1. 

k=0 
fE={f €H: lim,_,..A, f exists (n H)}, E is a closed subspace 
of H. 
Let M be the set of elements of H that are invariant under U, 
that is, M={f €H: Uf = fy, note that M is a closed subspace 
of H, by continuity of U. Let No = {g — Ug: g € H}. If we define 
f = lim,_,o. An f (where the limit exists) then: 


f eM implies fEE and f=f; 
f € No implies fEE and f = 0 


If N = No, the closure of No, then H is the orthogonal direct sum 
of M and N (see the discussion after 3.2.11). 


(Mean Ergodic Theorem) Let U be an isometry of the Hilbert space 
H, and let P be the projection of H on the space M of all elements 
invariant under U. For every f €H, 


j 22! 
-X Uf > Pf. 
n Lo 


If T is a measure-preserving transformation and U = T, we obtain 
L? convergence of f™ to f. 
If, in addition, T is invertible and $ = T~', then U is a unitary 
operator and U* = U™! = S. 


6. Let T be a measurable transformation on (Q2,.¥ ). Within the linear space 
of finite signed measures on .¥, let K be the convex set of probability 
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measures preserved by T. Show that P is ergodic relative to T iff P is an 
extreme point of K, that is, P cannot be expressed as A; P| + A2P>, with 


Ay,A2 > 0, ài +A. = l, Pı, P2 €K,P Æ P32. 


7. This problem gives many conditions equivalent to ergodicity; in particular, 
some of the conditions involve convergence in probability rather than 
almost everywhere convergence. 

Let T be a measure-preserving transformation on the probability space 
(Q,.4,P), and let sọ be a field of sets whose minimal o-field is Z. 
Show that the following conditions are equivalent: 

(a) T is ergodic. 

(b) r — P(A) a.e. for each A € Z, where 


n--l 
Dy) =n X To). 
k=0 


(c) i” — P(A) in probability for each A € F. 

(d) 1°”) —> P(A) in probability for each A € ¥%. 

(e) I -» P(A) ae. for each A € F. 

© n! Y?) P(A ATB) > P(A)P(B) for all A, B € Fp. 

(g) n`! Yi P(A NTB) > P(A)P(B) for all A, B € F. 

If T is a one-sided shift transformation, we may take o to be the field 
of measurable cylinders. Furthermore, if the coordinate random variables 
take on only finitely many possible values, a measurable cylinder 1s a finite 
disjoint union of sets of the form {Xo = io, ..., Xm = im}. Thus condition 
(d) is equivalent to the following statement: 

For each œ = (ip,...,im), m = 0, 1,..., with the i belonging to the 
coordinate space, let N” be the number of times that ip, ..., im occur in 
sequence in the first n + m coordinates, that 1s, 


n—li 
Ni(@) = X_I (T*o), where A= {w: @ = io, -.., Om = im}; 
k=0 


then n“ N” converges in probability to p(w) = P(A). 


Note also that by the Kolmogorov extension theorem, given any one-sided 
shift with coordinate random variables X,,, there is a two-sided shift with 
coordinate random variables X,,’ such that (X,’,n > 0) and (X,,n > 0) 
have the same distribution. [For example, specify that (X° g, X’ 3, X6’) 
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have the same distribution as (Xo, X5, X)4).| Condition (d) shows that the 
one-sided shift is ergodic iff the corresponding two-sided shift is ergodic; 
by 8.2.7, the same is true if “ergodic” is replaced by “mixing.” 


8.4 APPLICATIONS TO MARKOV CHAINS 

If T is a one-sided shift transformation and the coordinate random variables 
X, are independent, we have seen that T is ergodic (we also say that the 
sequence {X,,} 1s ergodic). In this section we exhibit a large class of examples 
in which T is ergodic, but the X, are not independent. 

First, we must consider some general properties of shift transformations. 
If Xo, X,,... are the coordinate random variables of a one-sided shift, recall 
(see 6.2.6) that the tail o-field of the X, is defined by Fao = (V o Fn, where 
Hn = F (Xn, Xn41,...), the smallest o-field that makes X; measurable for all 
i > n. We prove that the o-field of almost invariant sets is essentially included 


in Fao. 


8.4.1 Theorem. If A is an almost invariant set, there is a set B € Fa such 
that P(A AB) = 0. 


Proor. By 8.2.2(a), there is a strictly invariant set B with P(A A B) = 0. 
Now if C €.¥ then T-"C € Fa, for if C = {(Xo, X1, ...) € C’}, then T” C 
= {(X,, Xn41,---) E€ C’}. (Actually, C = C’ here since the X, are coordi- 
nate random variables.) But B=T~"B, so that B € Z, for all n, hence 
B € Fy. U 


The proof of 8.4.1 shows that all strictly invariant sets belong to zo; 
however, an almost invariant set might not be in z. For example, let 
A = {w: X,(w) =Xo(w) for all n}; then A c T™!A, so that A is almost 
invariant by 8.2.4; but A € Fo. 

The following fact about conditional probabilities will be used. 


8.4.2 Lemma. If T is a measure-preserving transformation on (Q, Z, P), 
and ¥ is a sub o-field of ¥, then for any A € F, 


P(A|¥)(Tw)=P(T'A|T'FY Xo) ae. 
In particular, if T is a shift transformation, then 
P(A | X,)(Tw) = P(T'A | Xn4i)(@) ae. 


Proor. If T'BeT™!Y, then 


pr'an TB) = | P(T 'A|T'F )aP. 


T-1B 
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But since T is measure-preserving, 
PT 'ANT7'B) = P(A A B) = [Pa |¥)dP 
B 
— J P(A| ¥)(Tw)dP(w) by 1.6.12 
T-'B 


Since P(A | #)(Tw) is T! ¥-measurable, the result follows. C 


We may give an intuitive interpretation of 8.4.2. If T is a shift transformation 
and w* = (w, w*,...) € R™, then P(T~'A | X,,41)(@*) is the probability that 
œw € T'A, given that the (n + 1)th coordinate of œw is X,41(@*) = w% 4; 
P(A | X,,)(To*) is the probability that Tw € A, given that the nth coordinate 
of Tw is X,(Tw*) = œ% 1; these two expressions agree. 

Our interest will be in sequences having the Markov property, defined as 
follows. 


8.4.3 Definition. A sequence of random variables {X,,n > 0} is said to 
have the Markov property iff for each B € F (Xn41, Xn42,--.),n = 0, 1,..., 


P(B | Xo, ...,Xn) = P(B|X,) a.e. 
If {X,,} 1s a Markov chain (see 4.11, and Ash, 1970, Chapter 7), then 
P{Xn41 = n41, -+-,Xnak = inẹk | Xo = io, ---, Xn = in} 
P{Xn+1 — n+l cee „X ntk — ln+k | Xn — in} 


for all ig, ...,i,4, in the state space. Thus the Markov property holds when 
B is a measurable cylinder, hence for all B € F(Xn+1, Xn42,...) by the 
monotone class theorem. 

In the Markov case, almost invariant sets assume a special form. 


8.4.4 Theorem. Let the coordinate random variables X,, of a one-sided shift 
have the Markov property. If A is an almost invariant set, there is a set B of 
the form {Xo € C}, C € .@(R), such that A = B a.e., that is, P(A AB) = Q. 


Proor. By 8.4.1, there is a set B in the tail ø-field Fz such that P(A A B) = 
0. Since B € F (Xn+1, Xn42, ---) for all n, P(B | Xo,...,X,) = P(B | X,,) ae. 
Consequently, P(A | Xo,...,X,) = P(A | X,,) a.e. Now 


P(A | Xo,...,Xn) > P(A | Xo, X1,...) ae. 
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(see 6.6.4); but 
P(A | Xo, X1,.--) = EU, | X0,X1,...) = IA ae, 


since A € F (Xo, X1, ...) =.¥. Therefore P(A | X,) > I, a.e. 
Given £ > 0, let H, = {w: |P(A | X,)(@) —I,4(w)| > £}. Then P(H,,) > 0 
since P(A | X,) — I, ae., hence in probability. Now 
T~'(H,) = {@: |P(A | Xn )(Tow) —I4(To)| > e} 
= {w: |P(A | Xn41)(@) — La(@)| = £} a.€. 
by 8.4.2 and the almost invariance of A. Thus T~'(H,,) = H,„41 ae. Since 
T preserves P, P(H,,) = PT~'(H,,) = P(Hp+1) for all n. Since P(H,) > O, 
we must have P(H,)=0, so P(A |X,)=J, ae. for all n, in particular, 
Ia = P(A | Xo) ae. 
Now P(A | Xo) is ¥Y(Xo)-measurable, hence can be expressed as f(Xo) for 
some Borel measurable f: R —> R [see 5.4.2(c)]. Thus (a.e.) 
wEA iff Ialæœ@)= l 
iff f(Xo(w)) = 1 
iff Xp(w) EC, where C=f '{l}. O 


8.4.5 Corollary. Under the hypothesis of 8.4.4, a set A €. is almost 
invariant iff A is of the form {X, € C for all n} for some C € (R). 


Proor. Let A be almost invariant. By 8.4.4, A is of the form {Xo € C}. But 
A = T'A a.e., so {Xo € C} = {X; € C} ae. Inductively, A = {X, € C for 
all n}. Conversely, every set A of this form has A C T~!A, hence is almost 
invariant by 8.2.4. C | 


We therefore have the following criterion for ergodicity of a Markov 
sequence. 


8.4.6 Theorem. If X, has the Markov property, then {X,„} is not ergodic iff 
there is a set C € (R) such that 0 < P{X, € C for all n} < 1. 


Proor. Apply 8.4.5 and 8.2.2(b). O 
We now apply these results to Markov chains. Let {X,,} be a Markov chain; 


assume the initial distribution {v;} is a stationary distribution, so that {X,,} is 
a stationary sequence and the machinery of ergodic theory is applicable. (See 
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Ash, 1970, pp. 236-240, for the appropriate background material on Markov 
chains.) 


8.4.7 Theorem. 

(a) If there is exactly one positive recurrent class C, then {X,} is ergodic. 

(b) If there are at least two positive recurrent classes C,; and C>, and 
iec, Vi > 9, Vice, Vi > 0, then {X,} is not ergodic. 


Proor. (a) Since the stationary distribution assigns probability 1 to C, we 
may as well assume that C is the entire space. Let D be a nonempty proper 
subset of C, and let ie D, j € C — D. By recurrence, if the initial state is i, 
then j must be visited; hence 


P{X,„ € D for all n} = X v;P{X, €D for all n | Xo = i} = 0. 


i 


By 8.4.6, {X,,} 1s ergodic. 
(b) It is impossible to exit from a recurrent class, hence 


P{X, € C; for all n} = P{Xo € Ci} = X v; € (0,1). 
ieC; 


By 8.4.6, {X,} is not ergodic. C 


The case in which there are no positive recurrent classes is not discussed in 
8.4.7 because in this case, there is no stationary distribution for the chain. Note 
that if there is exactly one positive recurrent class, the stationary distribution 
is unique. 

We now have many examples of ergodic sequences {X,,} where the X, need 
not be independent. For example, consider a finite Markov chain such that 
every state is reachable from every other state. The state space then forms 
a single equivalence class, necessarily recurrent positive. Thus if the initial 
distribution is the unique stationary distribution, the sequence {X,,} is ergodic. 

Now suppose {X,} is an ergodic Markov chain, and assume that the entire 
Space forms a positive recurrent class. Then by the pointwise ergodic theorem, 
if f € L'(Q,F, P), then 


n—| 
1 X f(Tw) > E(f) ae. 
n k=0 
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We have been assuming that the initial distribution {v;} is stationary, but this 
result holds regardless of the initial distribution. For 


0= P S S Tro aay 


k=0 
| n—| 
=) uP E Y fT*o) 4> E(f) | Xo = n) 
i k=0 


Since v; > 0 for all i, we have P{n7! ar f(T*w) + E(f)| Xo =i} 
= ( for all 7, and the result follows. 

If a Markov chain has exactly one positive recurrent class, and therefore a 
unique stationary distribution {v;}, then the mean recurrence time of state j, 
that is, the average number of steps required to return to j when the initial 
state is j, is the reciprocal of the probability v; that the chain will be in state 
j at any particular time. This is intuitively reasonable; if, say, v; = a. then in 
the long run, we are in state j on one out of four trials, so on the average it 
should take four steps to return to j. 

We are going to prove a more general result of this type. 


8.4.8 Theorem. Let T be a measure-preserving transformation on the 
probability space (Q,.%, P). If A €.F, let 


A; = [œ E A: T"w €A,n=1,...,k—1, To € A}: 


thus A; is the set of points œ € A such that T” œ returns to A for the first time 
atn =k, 

Define the recurrence time of A by ra(m) =k if æ € Ay, K=1,2,...; 
ra(@) = 00 if œ € A — |]J}, Ax (define rą arbitrarily on A°). Then 


J ra(@) dP(w) = P (U ra) , 
A n=O 


Before giving the proof, let us go into more detail on the meaning of the 
theorem. If T is ergodic and P(A) > 0, let E = U% o T”A. Since T'E CE, 
E is almost invariant by 8.2.4, and since P(E) > P(A) > 0, P(E) must be I. 
Thus if Q(B) = P(B | A) = P(BOA)/P(A), B€. Z, we have 


J ro) dQ) = 5 


If T is a one-sided shift and A = {Xo € C}, then f, ra dQ is the average length 
of time required for X, to return to the set C, given that the initial value Xo 
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belongs to C. Thus the mean recurrence time of C is the reciprocal of the 
probability that the process will be in C at any particular time. 


Proor. Let C, = TA, By = Co, k =0,1,... (take Co =A). Then, with 
intersections written as products, 


P(Agii) = P(CoB, -+ -BkCk41), k> 1 
= P(CoB,-+- By) — P(CoB, --- Buy) 
= P(B; -+ - By) — P(BoB, --+ By) — P(By -+ + Bra) 
+ P(BoB, - ++ Byy1) 
= P(Bo--- By) — 2P(Bo-+- By) + P(Bo--- Bayt) 
since T preserves P 
= by — 2by41 + bk+2 
where b} = P(Bo--: By_1), k > 1. When k = 0 we have 
P(A;) = P(CoC1) 
= P(Co) — P(CoB1) 
= | — P(Bo) — P(B)) + P(BoB)) 
= | — 2P (Bo) + P(BoB)), 
hence we have P(Az41) = by — 2bx4. + bg+2 for all k > 0, if we take bọ = I. 


Now 


J ra dP = Yk + DP (Arpi) 


k=0 


iim 


n+l 
ye + 1)b, — 257 kb, +y k Dha 


=0 k=l k=2 
= lim [1 — n (bn — byt) — bal. 
n — OO 


Now b, — bn+1 = P(Bo---Br—1C,) = 0, and 


b, > P (Aa) =| -e((Ucr] =1—P (Ura) | 


Since 1 — n(b, — ba+1)— bn has a limit and b, approaches a finite limit, 
n(b, — bn+1) must approach a finite limit. But the sets Bo---B,_)C, 
are disjoint, so that 5), (b, — b,41) < oo; this implies that n(b, — bn+1) 
—> 0. U 
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Problems 


l. The one-sided shift transformation T is said to be a Kolmogorov shift (or to 
be tail trivial) iff the tail o-field .F,, consists only of sets with probability 
0 or 1. It follows from 8.4.1 that a Kolmogorov shift is ergodic; show 
that, in fact, every Kolmogorov shift 1s mixing. 

2. Let T be the shift transformation associated with a finite Markov chain 
where the initial distribution is stationary. Assume that v; > O for all 7. df 
v; = 0, then P{X, =i for some n} < 5), P{X, = i} = 0, so that i may 
as well be removed from the state space.) 


Show that T is mixing iff a steady-state distribution exists, that is, iff pi 
— q; (independent of i) as n — oo, where > jli = I. (Necessarily, 
qj = vj, and the stationary distribution is unique; see Ash, 1970, 
pp. 236-237.) 
3. Let X,,X2,... be iid random variables, and let S, = `% | Xz, 
n=1,2,.... 
(a) If R,(@) is the number of distinct points in the set {S,(@),..., 
S,(@)}, and A is the event that S, never returns to 0, that is, 
A= {Sı #0,5.40,...}, show that n7~'E(R,) > P(A). (Hint: 
Express R, as 1 + $2; IB, where 


By = {Sk Æ Sk-1, Sk É Sk-2, ---, Sk E S1}-) 


(b) For a fixed positive integer N, let Z,(w@) be the number of distinct 
points in {Sj—1)wv41(@), ..., Sew(m)}, kK = 1, 2,.... Use the strong 
law of large numbers to show that 


(c) Show that limsup,_,.,n7'Ry < P(A) ae. 
(d) Let V; be the indicator of the set {S,4) Æ Sk, Sk+2 Æ Sk, ..-}; thus 


Vg = 1 iff $} is never revisited. Use the pointwise ergodic theorem 
to show that n7!(V, +--+ Vp) —> E(V}) ae. 


(e) Show that liminf,_,,,n~!R, > E(V)) a.e., and conclude that n—!R,, 
— P(A) a.e. 


8.5 Tue SHANNON~MCMILLAN THEOREM 
We will apply the pointwise ergodic theorem to prove a basic result of 
information theory. Consider a shift transformation with coordinate random 
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variables X, taking values in a countable set. We define a rather unusual 
sequence of random variables p(X, X1,...,X,—1) as follows. 
If Xo(w) = ip, ..., Xn—-1(W@) = in_-1, let 


p(Xo(@), .--,Xn-1(@)) = Plo: Xo’) = io, ..-, Xn-1@") = ini}. 


For example, if P{Xo = 1} = P{Xo = 2} = 4, P{Xo = 3} = 4, then p(Xo) has 
the value 5 with probability 5» namely, when Xọ = I or 2, and p(Xo) = 4 
with probability z, that is, when Xp = 3. 

Similarly, define the random variable p(X,, | Xn—1, Xn—-2,...,Xn—r) 
by specifying that if X, = in, Xn-1 =in-1,.-.,Xn—-r = İn—r, then p(X, | 
Xn—ly eee sXn—r) = P(X, = in |Xn-1 =In-1,---,Xn—r = In_r) assuming 
P(Xn-1 =In—1,..-,Xn-r = in—r} > 0. 

The Shannon-—McMillan theorem is an assertion about the convergence of 
—n—!log p(X, ..., Xn—1); in particular, if T is ergodic, we have convergence 
to a constant H, almost everywhere in L'. Before turning to the proof, let us 
look at the intuitive interpretation of the convergence statement. For this, we 
require only that —n~! log p(Xo, ..., X,—1) converge to H in probability. (It is 
traditional in information theory to use logs to the base 2, and we shall follow 
this practice here. Switching to natural logs involve only a multiplicative 


constant.) 
< a) ; 


Given £ > 0, ô > O, let 
thus if œ € A,, then p(Xo(w), ..., Xn-1(@)) is between 27" +8) and 27” H-9), 
If —n7!log p(Xo,..., Xn-1) > H in probability, then P(A,) > 1 — € for 
sufficiently large n. 

Now let S, be the set of all sequences of length n, with values 
in the coordinate space, coresponding to points in A,, that is, S, is 
the set of all sequences (Xo(m), ...,X,-|(@)), œ € Ap. If (ip, ..., in—-1) € 


l 
Ån = [|1 iog p(Xo, ...,Xn-1) —A 
n 


S,, then 27”%#+8) < P{Xo =ig,...,X,-) =i,-1} < 2"), furthermore, 
P{ (X0, sae ,Xn—-1) € Sn} — P(A, ), SO that l — E < P{ (X0, san ,Xn—1) € Sn} 
<]. 


Thus each sequence in S, has a probability between 27” +8) and 277% -%, 
and the total probability assigned to S, is between 1 — £ and 1. Consequently, 
the maximum number of sequences in S, is 1/27”#+t® = 27+) and the 
minimum number is (1 — ¢)/27-" 4-9 = (1 — ¢)2" 4-8), 

Thus, roughly, for large n there are approximatively 2” sequences of length 
n, each with probability approximatively 2~""; the remaining sequences are 
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negligible, that is, have total probability at most £. In information theory, this 
is referred to as the asymptotic equipartition property. If {X,} represents the 
output of an “information source” (such as a language) with r symbols, then 
of all the z” = 2”'°8" possible sequences of length n, only 2”” can reasonably 
be expected to appear. 

The number H will turn out to be the entropy of the sequence {X,}, and 
we must discuss this concept before going any further. 


8.5.1 Definition. If X is a discrete random variable (or random vector), 
define the entropy (also called the uncertainty) of X as 


H(X) = — >> p(x)log p(x) 


Xx 


where p(x) = P{X = x}. If X and Y are discrete, define the conditional entropy 
of Y given X = x as 


HY |X=x)=—-) > p(y | x)log p(y | x) 
¥ 


where p(y | x) = P{Y = y | X = x}; also define the conditional entropy of Y 
given X as a weighted average of the H(Y | X = x), namely, 


H(Y | X) = ` pOOH(Y | X =x) 


=—) p(x, y)log p(y | x). 
X,Y 


The joint entropy of X and Y is defined by 


H(X, Y) = — X _ pa, y)log pcx, y); 
X,Y 


since H(X, Y) is the entropy of the random vector (X, Y), nothing new is 
involved. 


Note that entropy is always nonnegative; if the random variables are allowed 
to have a countably infinite set of values, an entropy of +-oo may be obtained. 
Difficulties with events of probability zero are avoided by defining 0log 0 = 0, 
— log 0 = +00. 
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It is often convenient to express entropy in terms of the random variables 
p(X) introduced at the beginning of the section; we have 


H(X) = E[—log p(x)I, H(X | Y) = El- log p(¥ | X)]. 
We now establish a few properties of entropy. 


8.5.2 Theorem. Let X, Y, and Z be discrete random vectors with finite 
entropy. 


(a) Ifpi, po,.-., 91, q2,...are nonnegative numbers with X`; p; = X; qi= 1, 
and either — `; p; log p; < oo or — 55. pi log q; < œœ, then 


-3 UP log pi < -9 P log qi, 


with equality iff p; = q; for all i. 
(b) A(X, Y) < H(X)+H(Y), with equality iff X and Y are independent. 
(c) H(X, Y) =H(X)+H(Y |X)=AY)+A(X |Y). 
(d) H(Y |X) < AW), with equality iff X and Y are independent. 
(e) IfX takes on r possible values x, -, x,, then H(X) < logr, with equality 
iff p(x) = l/r, i = 1l,...,r. 


O H(Y,Z|X)<H(Y |X)+H(Z |X), with equality iff Y and Z are 
conditionally independent given X, that is, 


p(y,z |x) = ply | x) pt | x) forall =x, y, z. 


(€) H(Y,Z|X)=H(Y |X)+H(Z|X,Y). 


(th) H(Z|X,Y)<H(Z |X), with equality iff Y and Z are conditionally 
independent given X. 


Proof. (a) For convenience we switch to natural logs. Since x — 1 1s the 
tangent to logx at x = 1, we have logx < x — 1, with equality iff x = 1. Thus 
log(qi/ pi) < (qi/ pi) — 1, with equality iff p; = g;; hence, even if p; or q; = O, 


pi log q; = pi log(q:/ pi) + pilog pi < qi — pi + pi log p; 


with equality iff p; = g;. Sum over i to obtain the desired result. (Note that 
if X`; pilogq; = >_, pi log p;, then the sum is finite by hypothesis, so that 
pi = qi for all i.) 

(b) By (a), — $. y PŒ, y)log pæ, y) < — Žo., PE, y) log p(x) p(y), with 
equality 1ff p(x, y) = p(x) p(y) for all x, y. 
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(c) This follows from the fact that p(x, y) = p(x) p(y | x) = ply) p@ | y). 

(d) This is immediate from (b) and (c). 

(e) Apply (a) with p; = p(@;), and q; = 1/r. 

© Since H(Y | X) < oo by (d), H(Y | X =x) < of for each x (such that 
p(x) > 0), and similarly for H(Z | X = x). The proof of (b) shows that 


H(Y,Z|X=x)<H(Y |X =x)+H(Z|X =»), 


with equality iff p(y, z | x) = p(y | x)p(z |x) for all y, z. Multiply by p(x) 
and sum over x to complete the proof. 

(g) This follows from p(y, z |x) = p(y |æ p(z |x, y). 

(h) Apply (f) and (g). C 


We also need an entropy concept for stationary sequences. 


8.5.3 Definitions and Comments. Let Xo, X1, ... be a stationary sequence 
of discrete random variables, and assume that H (Xo) < oo. Define the entropy 
of the sequence as 


H{Xn} — lim H (Xn | Xo, X1, 225 Xn—])- 
hi— OO 


Now 
A(Xn4+1 | Xo, “7 , Xn) < A(Xn41 | X1, s. , Xn) by 8.5.2(h) 
= H(X, | Xo,...,Xn-1) by stationarity; 


also by 8.5.2(h), H(X,, | Xo,...,Xn-1) < H(X,) = H (Xo) < oo. Thus the 
limit defining H{X,,} exists and is finite. | 

(To apply 8.5.2(h), it must be verified that H(X,,...,X,) < oo, but the 
proof of 8.5.2(b) shows that if H(X;) < œ, i= 1,...,n, then 


H(X... Xn) < H(X) +---+H(X,), 
with equality iff X,,-,X, are independent. In the present case, H(X;) 


= H (Xo) < œ for all i.) 
The entropy of {X,,} may also be expressed as 


l 
H{Xn} = lim -H (Xo, ..., Xn-1). 
n> n 


To see this, observe that by induction using 8.5.2(c), 
H(Xo, -..,Xn-1) = H (X0) + A(X, | Xo) + A(X? | Xo, X1) 
+--+ H(Xn-1 | Xo,---, Xn—2)- 
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Thus n~'H(Xo,...,Xn—1) is the arithmetic average of a sequence 
converging to H{X,,}, and hence converges to H{X,,}. 

We now begin the development of the Shannon—McMillan theorem. It will 
be convenient to consider a two-sided shift transformation T with coordinate 
random variables X,,, where the X, take values in a countable set. It is assumed 
throughout that H(X9) < oc. 

Martingale theory will be significant, as the following result suggests. 


8.5.4 Theorem. Let Yo = — log p(Xo), Y = — log p(Xo | X_),..., Xp), 
k > 1. Then {Yp, Y,,...} is a nonnegative supermartingale, hence converges 
a.e. to an integrable limit function Y. 


Proof. Since E(Y,) = H (Xo | P, CEE PUR ,X k) < H (Xo) < oo, the Y; are 
integrable. Now since all random variables are discrete, we may write 


E(¥n41 | Xo = x0, .--,X—n = X-n) 


—_— ~~ ` P(X—(n41) | NOs s+ 2s X_») log pxo | 7 Se »X_(n4+1)) 
X—(n41) 
Xs +++, X_ l XH], ..., X— 
= 4 YD PHO Ft) hog PA n+1)) 


ann p(xo, .--,X_n) po, -- +s X—(n41)) 


This is of the form Y`; a; logx;, where a; > 0, X`; aœ; = 1, and hence is less 
than or equal to log (}°, a;x;) by convexity. Thus 


EY n+ | Xo = X0,.--, Xn = Xn) 


Xj, ..., X 
< log y p(x-1 (n+1)) 


Py Xa) 
p(x-1, se »X_n) 

= log ———————_ 
po, sho , Xn) 


= — log pxo | x-1, >., X—n). 


Therefore E (Y „+1 | Xo, ..., X-n) < Yn, and since Y, is measurable relative 
to the o-field (Xo, ...,X-—n), the result follows. C 


We will show that the random variables Y, are uniformly integrable. It will 
be convenient to assume that the coordinate space is a subset of the positive 
integers; this amounts only to a relabeling, and can be done without changing 
the distribution of (Yo, Y,...). 
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8.5.5 Lemma. If r is any fixed positive integer, the random variables W,, 
= Y,lix,<r, n =0,1,..., are uniformly integrable. 


Proof. For any positive integer k, 


Cx 
W,, dP = J W, aP 
I 2 {i<Wa<i+l]) 


< X (i+ DP{W, > i}. 
i==k 


But if W, >i, then Y, >i, and it follows that p(Xo,...,X_,)< 
2 p(X_1,...,X_,). Thus 


P{W, > i} = ` P(X0, <., Xn) 


{xo, re —n. Yn >1,X9 =r} 


<2 X ` p(X], ---,X—n) 


XOEF | (Xi, Xn) 


Consequently, 


OO 
J W, dP < X ri +12“ > 0 
{Wn zk} i=k 


as k — oo, uniformly in n. L 


8.5.6 Theorem. The random variables Y,, n = 0, 1,..., are uniformly 
integrable; thus in 8.5.4 we also have Y, — Y in L!. 


Proor. For any positive integer k 


J YndP = | Yn dP + | Y, dP 
{¥,>k] {¥n>=k,Xo>r} {¥, >k, Xo<r) 


< J Y „dP + J W,, aP. 
{Xo>r] {Wn >k} 
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By 8.5.5, the second integral on the right approaches 0 as k —> oo, uniformly 
in n for a fixed r. Now 


I, nape -5| > po, ---5X—n) 


Xo>r X_n 


x log p(xo | x-1,- -- Xn) 


= — Ro p(X_1,---,X—n) 


ġ kka 


x Spo | x-1, <- +, Xn) log p&o | x-1, <- -s Xn). 


xo>r 


Write log p(xo | x-1, .--,X-n) = log p(xo) + logl p(x | x-1, .--, x-n)/ p(%o)I; 
the contribution due to log p(x) 1s 


— ` 3 ` po, . --, X-n) log p(xo)| = — ` pxo) log p(xo) 


xo>r a Xg> fF 
and the remaining contribution is 


p(X) 


po | X1, >. Xn) 


p P-1, 65% -n) X p@o | X1, «+. Xn) log 


-Xn Xo>r 


An upper bound to this expression, obtained by switching to natural logs for 
convenience and using log x < x — 1, is 


X p(xo) — X po) =0 


Xo > F Xg> fr 
Therefore, 
J YndP <- Y p) log prs) > 0 
{Xo>r)} Xo>r 
as r—> oO, uniformly in n, since H (Xo) < oo. 


If £ > 0 is given, we may choose a fixed r such that Jixa>r) Y, dP < £/2 
for all n; then for sufficiently large k, fiw, >k) W,„dP < £€/2 for all n, hence 
Jey sig Y, dP < e for all n. O 

The other basic property of the Y,, that we need is that sup, Y,, is integrable. 
We prove this after a preliminary lemma. 
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8.5.7 Lemma. If i is a positive integer, define Y® = — log p(Xo 
=1|X_),...,X_,), that is, if X_; =i_),...,X_, =i_,, then 


Y® = —log P{Xo =i | Xi = i1,..., X n = in} 
(define yo = — log P{X, = 1}). If à > 0, let 
E,(A) = | max Y <À < rn) , E®Qa)y= | max Yi <1 < yo 
If A; = {Xp = 7}, then 


P(E, (A) O Ai) = PEC O) N Aj) < 27> PEM A). 


Proor. On A; we have Y, = Y®, hence E, (à) O A; = E® (A) AO A;. Now 
since E (A) belongs to the o-field F(X_1,...,X_n)s 


P(A; N E®0)) = J O PAIX,- Xn) aP 
EM (a) 
=J p(Xo =i | X_1,...,X_,) dP 
EW (A) 
= J O Xr dP < VÒ P(EO QY) 
EPO) 


by definition of EQ). O 
8.5.8 Theorem. The random variables Y, satisfy E[sup, Y;] < oo. 


PROOF. We may write 


-~ VEEE) NAD — by 8.5.7. 
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If the coordinate space is finite, there is no difficulty; by 8.5.7, 


E | sur ri < 5 27 2 PEPO) < OO 


r,n=0 


since the E® (r), n =0,1,..., are disjoint and the sum over i is finite. 

In the general case, assume without loss of generality that the numbers 
pi = P(A;) decrease as i increases (if necessary, relabel the elements of the 
coordinate space). We then have p; < 1/i for all i, and if p; > 1/i, then 
Pi,---, pi are all greater than 1/i, a contradiction. Let f be a function, to be 
specified later, from the nonnegative integers to the nonnegative reals, such 
that {r: f(r) < i} is a finite set for each 1. Then by 8.5.7, 


Yo} do PEP MAD] < 27 X EPE <27 f(r) 
n=0 |i<f(r) i< f(r) Ln=0 


by disjointness of the E“) (r) n=0,1,.... 


Also by disjointness, 


OO 
DIE PEPE NAD] < SD ra= Y p 
n=0 LE LO i> f(r) i> f(r) 


Therefore, 


E [sup r| < ye Oty > Pi 


r=0 | i> f(r) 


The second series is }>~, V fin<i Pi = \~ fopi, where foli) is the 
number of nonnegative integers r such that f(r) <i. If we set f(r) =2’ 
(r+1)~*, then the first series converges, and f(r) <i iff r < logi+2log 
(r + 1). 

Choose a positive integer B such that 2x~'log(x+1)< 45 for all 
x > B. Then: 


f(r) <i, r>B implies” r< logi+r/2, that is, r< 2logi, 


and 


f(r) <i, r<B implies r< B. 
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Therefore, fo(i) < 2logi+ B, i = 1,2,..., hence 


OO OO OO 
X fop: < 25 pilogit+ BY pi 
i=] i=l i=] 


co. | — 


I 


OO 
l 
< 25. p; log D, +B since pi £ 
i=] 


= 2H (Xo) + B < œ. U 
We may now give the main result. 


8.5.9 Shannon-McMillan Theorem. Let T be a two-sided shift transfor- 
mation with discrete coordinate random variables X,. Assume that H (Xo) 
< oo, and let H be the entropy of the sequence {X,}. 

If Z, = —n~! log p(Xo,...,Xn_1), there is an invariant random variable Z 
such that E(Z) = H and Z, — Z, almost everywhere and in L!. In particular, 
if T is ergodic, Z = H a.e. 


Proor. If the random variables Y, are defined as in 8.5.4, we have 


n—-| 


li 
— X log D(X, | Xk-1, --., X0) 
" k=O 


Zn 


1 n— i 
=-) Y(T“) 
n o 


1 n—-|] 1 n—-| 
~ NO Y(T*)+ - N Y(T“) — Y(T"). (1) 
k=0 k=0 


By 8.3.6, 8.3.7, and 8.3.8, the first series in (1) converges to an almost 
invariant function Y, a.e. and in L!. By 8.3.9, E(Y) = E(Y), and by 8.5.6, 
E(Y) = limp „œ E(Y,) = H. The second series converges to 0 in L! since 
Y; > Y in L! by 8.5.6, and ||Y,(7*) — Y(T) = IY} — Yl) by 1.6.12. 
Thus if we take Z = Y , all that remains is to show that Z, — Z a.e., and 
this will follow if we show that the second series in (1) converges to 0 a.e. 
But for any positive integer N, the series is bounded in absolute value by 


N~-| n-| 
LSO KT) -YTO + = Sv) — YT). (2) 
E k=0 M KEN 


8.5 THE SHANNON-—McMILLAN THEOREM 385 


The first series approaches 0 a.e. as n — œ since Y,(T*)-—Y(T*) is 
integrable, hence finite a.e. If we define 
Gy = sup{|Y; — Y|: k = N}, 

then Gy < 2sup, Y}, which is integrable by 8.5.8. Also, since Y; > Y a.e., 
we have Gy > 0 ae. as N > oo, hence E(Gy)— 0 by the dominated 
convergence theorem. But the second series in (2) (call it h,) is bounded 
by n7! am À Gy (T*), which converges to Gy a.e. and in L', by 8.3.6 and 
8.3.7. 

Finally, given ¢ > 0, ô > 0, we have P{Gy > £} < €~'E(Gy) by Cheby- 
shev’s inequality, and by 8.3.9, E(Gy) = E(Gy) > 0. If we choose N such 
that e 'E(Gy) < ô, then Gy < £ on a set of probability greater than 1 — 4, 


hence limsup, _,,, An < € on this set. Since £ and ô are arbitrary, it follows 
that A, > 0 a.e. D 


Since the Shannon—McMillan theorem involves only the random variables 
X,,n > 0, it holds equally well for a one-sided shift. For an indication of how 
to prove this formally, see the discussion at the end of Problem 7 in 8.3. 


Problems 
1. (a) In the Shannon—McMillan theorem, give an example in which the 
limit random variable Z is a.e. constant, but T is not ergodic. 


(b) In the Shannon—McMillan theorem, give an example in which Z is 
not a.e. constant (of course, T cannot be ergodic). 


2. If the coordinate space is finite and 1 < p < oo, show that the random 
variables Z? are uniformly integrable; thus Z, > Z in L?. 


3. [A short proof of a special case of the Shannon—McMillan theorem 
(Gallager, 1968).] Let Xo, X1, ... be a stationary sequence of discrete 
random variables, with H(X 9) < oo. (In this problem, entropy will be 
expressed using natural logarithms for convenience.) Define an mth-order 
approximation to p(Xo,..., Xn—1) by 


Gm(Xo,+--,Xn—-1) = P(Xo ,.--,Xm—-1)P (Xm | Xo,..-,Xm—1)° °° 
P(Xn-1 |Xn-m-1; +++) Xn—2); m= 1,2,...;n >m 


Now note that |In y| = Int y + In” y = 2In* y — In y < 2e`!y — In y, so 
in om 


Fle rls cS) = (ms) 


where Gm = Gm(Xo,---,Xn—-1), p = P(Xo, ---,Xn-1). 
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(a) Show that E(q,,/p) < 1 and 


E (— in 2) = A(Xo,..-,Xm-1) + (n — mA (Xm | Xo, .--,Xm—1) 


— H(Xo,...,Xn-1). 


Thus 
1 m 2 H(Xo, ..-,Xm_ 
E|- in 2 < 2 MUO, «Amv 
n p ne n m 


H (Xo, .--,Xn-1) 


m 
+ (1-2) H&m | Xo- Xm) - 


n 
(b) Assume {X,„} ergodic, that is, the associated shift transformation 1s 


ergodic. Show that as n > oo, —n~! In qm converges a.e. and in L! 
to H(Xm | Xo, -.-,Xm-1)- 


(c) Let H be the entropy of the ergodic sequence {X,,}. Given £ > 0, 
choose m such that 


H(Ko,---sXm-1) _ p| E 
m 2 


and 


E€ 
|H (Xm | Xo, ---,Xm-1 — H| < z’ 


Show that E[| —n~'In p(Xo,...,Xn—1) — H|] < £ for sufficiently 
large n, thus establishing L! convergence in the Shannon—McMillan 
theorem under the hypothesis of ergodicity. 


8.6 ENTROPY OF A TRANSFORMATION 

We define in this section the notion of entropy of a measure-preserving 
transformation, and show that two isomorphic transformations have the same 
entropy. For Bernoulli shifts the converse is true: two Bernoulli shifts with 
the same entropy are isomorphic. 


8.6.1 Definitions. 
(a) Let (Q,.¥, P) be a probability space and Æ a finite subfield of .¥ (such 
an Æ is a o-field) with atoms A,,...,A,. The entropy of Æ is by 


definition 
k 


H(.4) =— X P(A) log P(Ai). 


i=] 
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(b) If Æ is another finite subfield of Z, with atoms Bı,..., Bp, the 


3 


conditional entropy of B given Æ is 


k h 
H(#|.4)=— Y X_ P(A; A B;)log P(B; | A). 


i=l j=l 


There is in fact nothing new in those definitions. If the discrete random 
variables X and Y are defined by X =i on A;, Y = j on B;, we have 
H( Æ) = H(X) and H(# | Æ) = H(Y |X). 

Since we consider only finite subfields of Z, H (Æ) and H( £ | Æ) are 
always finite and the properties of entropy given in 8.5.2 can be rewritten in 
terms of finite subfields. 


8.6.2 Theorem. Let .4, #, € be finite subfields of .Z. We have the 

following properties (where .4 v Æ denotes the smallest (finite) field 

containing Æ and .#). 

(a) H(Æ V #@)<HC4)4+ ACA) with equality iff “4 and # are inde- 
pendent. 

O HLE yV @A=HC4)+H(8 | 4)=ACA)+HCS | 2). 

(c) Consequently, H (2) > H(.4) if B DÆ. 

(d) H(B | Æ)<H(2) with equality iff Æ and .@ are independent. 

(e) If Æ has k, atoms, H (Æ) < log k, with equality iff each atom of Æ 
has probability 1/k. 

O HC@VE|.46)<ACe |-4)+H( | Æ), with equality iff @ and 
& are conditionally independent given Æ. 

(E) ACSVE|.4A)=HA(4 | 4)+HE | 6VA)=H(E|-4)+ 
HCE | ÆA NV EN. 

(h) Consequently, H(@ | 4) > HCE | AiE D JP. 

Gi) HCE v B)<H(E |Æ), with equality iff # and & are condi- 
tionally independent given Æ. 

0) Consequently, HCE |.@) <HA(€ |-4) if £ CB. 


Entropy is often said to be a measure of uncertainty. We give a few examples 
to justify this statement. 


l. Consider two finite subfields é and .@ of F and assume that .@ >D Æ. 
(Therefore each atom of Æ is a subset of an atom of Æ.) If we know that 
the outcome œw is in the atom A; of Æ, we still do not know which atom 
of .4 contains w; but knowing that œw is in the atom B; of @ is enough 
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to tell us which atom of Æ contains w. This means that intuitively the 
uncertainty is greater for the subfield .4 than for the subfield Æ. This 
agrees with the inequality H (8 ) > H(Æ). 


2. If we roll a die, the uncertainty of the outcome is intuitively greatest 
when the die is unbiased, that is, when the 6 atoms corresponding to the 
6 outcomes 1, 2, 3, 4, 5, 6 all have the same probability 1/6. This is 
exactly what 8.6.2(e) states. 


3. The inequality 8.6.2(}) states that the more we know the less the 
uncertainty. 


8.6.3 Remarks. Entropy is invariant under measure-preserving transforma- 
tions: if T is a measure-preserving transformation on (Q,.¥, P), and if Æ 
and .& are two finite subfields of 2, T” Æ ={T-"A: A €e Æ} and T” B 
are finite subfields of Z, n = 1,2,.... The measure-preserving property of 
T implies that H(T~".4)=HC4) and H(T "ÆA |T” 8) = H(A | 2), 
n=1,2,.... 

Because of the correspondence between measure-preserving transformations 
and stationary sequences, the entropy H {X,,} of a stationary sequence can also 
be rewritten in terms of measure-preserving transformations. 


8.6.4 Definition. If A,,...,Ax are the atoms of Æ, and X is the ran- 
dom variable defined by X =i on A;, the sequence Xp = X, Xı = XOT,..., 
Xn = XT", ...1S8 a stationary sequence and 

n-| 

V Tig ) = H (Xp | Xo, Xi, -<-> X p1). 


7 (re 
i=0 


By 8.5.3, limp, H (T> yez T-A) = limpo H (Xn | Xo, Xis --., 
Xn-1) = H{X,,} exists. 

The entropy of a measure-preserving transformation T with respect to a 
finite subfield Æ is defined as 


H(4,T)= lim H (r 


n—1 

VTZ ) | 

i=0 
1 

The identity H {X,} = lim >o —H (Xo, ...,Xn_1), proved in 8.5.3, can be 

n 

written in terms of H(4,T). 

8.6.5 Theorem. 

(a) H(4,T)=lim, 0 1H (Via TA). 

(b) HZ., T) = lim H (4 | y2 TA). 
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PROOF. (a) 1S a consequence of 8.5.3. 
(b) By 8.6.2(j) we have H (avi ' T- 4) > H (|V T“) = 0, 


and therefore lim, H (< | Vo T=) exists. Since 
n—l n—| n—| 
H (V a) = H Ç VV ra) +H 8 4 by 8.6.2(b) 
i=0 i=1 i=l 
n—| 
=u (4 


VV ra) +H (V me) by 8.6.3 
— Ç V ra) +H (fV 


i=l 
|; 4) fi 
i= 
+H(4|T'4)+H(T' 4) 


the argument in 8.5.3 shows that 
n—| | 1 n—| 
lim H Ç VV ra) = lim —H (V ra —H(4,T). O 
hi oo i=l nc fl 


8.6.6 Corollary. If T is an invertible (measurable, one-to-one, onto, with 
T~' measurable) measure-preserving transformation, then 


H(4,T)= lim H (£ 


Vre). 


i=] 


VTi) = = lim H (12 
i=] 


Notice that we can rewrite this corollary in terms of stationary sequences. 
Since T is invertible, we can define the two-sided stationary sequence 
Xo = X, X, = XT", n= +1,242,..., where X =i on the atom A; of Æ. 
The corollary then becomes A{X,} = im,» H (Xo | X_1,...,X—n) 
= Tim (X—» | Xo, ---,X-—n-1)) This means that in the definition of 


H{X,} we can run the time backwards. 


Proor. By 8.6.3, 


n (2 


n 0 
Vra) =H|T"4| V ra| 


i=l i=—(n—1) 
=u (12 


n—| 
VV ra) , 
i=0 
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and 


7 (r 


n—l —| 
Vra) =H | Z VV Ti A 
i=l i=—(n—1) 

_ ( 


8.6.7 Remark. If Æ and .% are two finite subfields of 4%, we have 


(a) H(4,7T)<H(&,T)if.4 CP. 
bO HCA vV BT) <H(.4,T)+H(). 


Here, (a) is an easy consequence of 8.6.2(c) and 8.6.5(a). 
For (b), notice that 


n—1 n—| n—| 
H (V T "(4 va) =H (V TZ v \ ra) 


i=0 i=0 i=0 


=H (V ra) +H (V TB 


i=0 i=0 


n—| 
VV ria) 
i=0 


by 8.6.2(b). 
By 8.6.2(f), (j), and 8.6.3 we have 


n—| n—| 
H ( V Tg |V r.a 
= i=0 


n—|] 
< H (ris VV r=) 
l i=0 


<\ H(T'@ |T"'4)=nH(8 | £). 
i=) 


= 


We are now ready to define the entropy of a measure-preserving 
transformation in such a way that it is invariant under isomorphism. We first 
specify what we mean by isomorphic transformations. 


8.6.8 Definition Isomorphic Transformations). Let (Q1, f1, P1, T1) and 
(R2, F2, P2, T2) be two probability spaces with measure-preserving 
transformations T; and T»; (Qi, Z1, P, Ti) and (&%, >, P2, T2) are 
isomorphic iff there exist Qı’ € #, and Q2’ € F2, and a mapping ®: Q)’ 
— 9," such that: 
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(a) P(Q) = P(Q) = 1; 

(b) the mapping @® is one-to-one and onto; for any Bi C Qi’, Bz = P(B)) 
E Fa iff B} E Fi; 

(c) for any subset B, € .¥, of Q1’, Po(P(B,)) = Pi (B1); 

(d) T(R) cC Q and To (22) C Q7; 

(e) O(T\w) = T:(æw), for each w in 22)’. 


Properties (b) and (c) state that the probability spaces (Q)’,.% 1’, P1) and 
Q, F P2) are isomorphic [where Z” (resp. . 3) denotes the o-field 
consisting of the subsets of Qı’ (resp. Q2), which are in %1 (resp. in 
F a)|. Property (d) is purely technical; it allows us to limit ourselves to what 
happens in Qı” and Qz’. Property (e) states that Tı and T2, considered as 
transformations on Qı’ and Qz’, are compatible with the isomorphism ®. We 
could have defined the isomorphism of (21, 1, P1, Ti) and (R2, 2, P2, T2) 
by imposing that Qı = Qı and Q = Q2, but nothing is changed in the 
structure of a probability space 1f we add or remove a set of measure zero. It 
is therefore natural to add the property (a) to the definition. 


8.6.9 Definition. Let T be a measure-preserving transformation on a 
probability space (Q, Z, P); the entropy of T is 


H (T) = supH (Æ,T) where the sup is taken over 
a 
all finite subfields 4 of F. 


8.6.10 Theorem. Let T; and Tz be measure preserving transformations on 
(Q,,. Fi, Pi) and (Qo,.%2, P2), respectively. Suppose that (Q1, Z1, Pi, Ti) 
and (§22,.4%2, P2, T2) are isomorphic. Then H (T1) = H (T2). 


Proor. The result is a consequence of the following remarks (we use the 
notations of 8.6.8). 


(a) The entropies of Tı in Q; and 92," are the same because P(Q) — 2)’) = 
0. The same is true for the entropies of T2 in Q» and Q7’. 


(b) For any finite subfield Æ , of F ;", PÆ; is a finite subfield of ¥ >’, and 
any finite subfield of Z2’ is of this form. 


(c) If 4 is a finite subfield of Z,’ 
H (1, Ti) = A(®4,,T2) O 


To conclude this section we give two results, which in certain cases allow 
us to compute H (T) easily. 
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8.6.11 Theorem. If T is a measure-preserving transformation on a 
probability space (Q,.¥, P), and if the finite subfield é of Z is such that the 
subfields T~".4, n = 0,1, ... are independent, then H(.4,T)=HAHC4@). (If 
T is invertible, the independence of the T” Æ for n = 0,1, ...is equivalent 
to the independence of the 7".4 for n = Q, +1, +2,... by the measure- 
preserving property of T.) 


Proof. We have 


H(4,T) = lim ZH (V T- | by 8.6.5 


i=0 


1 
= lim — [H(.4)+H(T '4)+---+HT 4) 
n>n 
by 8.6.2(a) 


= lim [H (4) +H (4) +--+ HAN by 8.6.3 
= H (Æ) U 


We will now show that if [Z _„ T” Æ generates ¥, we have H (T) 
= H( Æ, T). We start by proving two lemmas. 


8.6.12 Lemma. Let T be an invertible, measure-preserving transformation 
on the probability space (Q2, Z, P) and Æ be a finite subfield of #. We have 
for every n = 0,1,..., 


H ( VV TB. r) —H(4,T). 
PROOF. 


yO l n 
H (V T' A, r) = lim z” ( VV r'| by 8.6.5(a) 


i=—n j=—n—k+1 


2n+k—l 
= lim ae V r= by 8.6.3 


k>œ K 


lim TA 
= im 2n N (V ) 


—H(4,T). O 
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8.6.13 Lemma. Let Æ and Æ be two finite subfields of 7. Assume 

that both Æ and .# have k atoms, denoted by Aj,...,A, and 

B,,..., Bg, respectively. Then for any ¢ > O there exists a 5 > O such that 
“P(A; A Bi) < ô implies |H (£, T) —H(%,T)| < €. 


Proor. Let Co = Ur Ai N B;) and C; = A; — Co, i= 1, ...,k. We have 
P(Co) > 1— ô. If P(Co) =1, then P(A; AB;) =O for all i. In this case 
H(.4,T)=H(#,T). Therefore we can assume that P(Cp) < 1. If we apply 
8.6.2(e) to the probability P’(D) = P(D)/(1 — P(Co)) on Q — Co, we see that 


k 
— X P'(C;) log P'(C;) < logk. 


i=] 
Replacing P’(C;) by P(C:)(1 — P(Co)) we get 


k 
—~ 5 _ P(C;)log P(C;) < (1 — P(Co)) log k — (1 — P(Co)) Rog(1 — P(Co)) 


i=l 


< dlogk — dlog â 


if 6 < 1/e. And, therefore, if % is the finite subfield generated by the C}, 
i=0,...,k, we have 


H(@) < —P(Co) log P(Co) + dlogk — Slog S 
< —(1 — ô) log(1 — 5) — ôlog ê + ê log k 


if 5 < 1/2. We can choose ô in such a way that H(#) < ¢. We then have by 
§.6.7(a) 
H(.4,T)<H(46V#,T)=H( FV e,T) 


since Æ V B =B VvE. By 8.6.7(b) we have H(.@ v ÆT) < H(4,T) 
+ H(€). Therefore H(.4,T) < H(8B,T)+ 6, for sufficier-ntly small ô. We 
can similarly prove that H(.4,7T) < H(Æ,T)+ e, if 6 is smmall enough. L 


8.6.14 Theorem Kolmogorov—Sinai. If T is an invertible.-, measure~preser- 
ving transformation on the probability space (2,.4,P), and if the finite 
subfield . is such that the o-field generated by the fiel@d Le. T” 18 
equal to ., then H (T) = H(Æ,T). 


Proor. Let Æ be a finite subfield of Z with atoms BS),..., By. Given 
any 6 >0 we can find C;...,C, E UJE T"-4 such tharat P(B; AC;) < ô, 
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8.6.11 Theorem. If T is a measure-preserving transformation on a 
probability space (Q,.¥, P), and if the finite subfield Æ of Z is such that the 
subfields T7”, n = 0, 1,...are independent, then H(.4,T) = HC4). (df 
T is invertible, the independence of the T” Æ for n = 0, 1,...is equivalent 
to the independence of the T” Æ for n = 0, +1, +2,... by the measure- 
preserving property of T.) 


Proor, We have 


] n—| 
H(4,T)= lim -H (V ria) by 8.6.5 
n>n 0 


lim - HH (4) + HTL) +--+ HT) 


by 8.6.2(a) 


1 
lim —[H(.4)+H(.4)+---+H(64)] by 8.63 
noo n 


HW) O 


We will now show that if LU?” T” Æ generates Z, we have H(T) 
= H(.4,T). We start by proving two lemmas. 


8.6.12 Lemma. Let T be an invertible, measure-preserving transformation 
on the probability space (Q, Z, P) and Æ be a finite subfield of A. We have 
for every n = 0,1,..., 


H (V TİZ, r) = H(.4,T). 


PROOF. 


H (V r4.T) = = fim -H ( V ria by 8.6.5(a) 


i=—n i=—n—k+] 


2n+k—1 
= lim 7# ( VV ra) by 8.6.3 


k 
OS i=0 


N 1i AF 
= lim —H | V TZ 
N+coN—2nN \ 


—H(4,T). O 
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8.6.13 Lemma. Let é and Æ be two finite subfields of 7. Assume 

that both Æ and .# have k atoms, denoted by A),...,A, and 

B,,..., Bg, respectively. Then for any ¢ > 0 there exists a ô > O such that 
fı P(A; A Bi) < ô implies |H (£, T) — H(.#,T)| < €. 


Proor. Let Co = (f(A; O Bi) and C; = A; — Co, i = 1,...,k. We have 
P(Co) > 1— ô. If P(Co) = 1, then P(A; A B;)=0 for all i. In this case 
H(4,T)=H(#,T). Therefore we can assume that P(Co) < 1. If we apply 
8.6.2(e) to the probability P'(D) = P(D)/(1 — P(Co)) on Q — Co, we see that 


k 
— X P'(Ci) log P'(C;) < logk. 


i=l 
Replacing P’(C;) by P(C;)/(1 — P(Co)) we get 


k 
— X P(C;)log P(C;) < (1 — P(Co)) log k — (1 — P(Co)) log — P(Co)) 


i=l 


< dlogk — dlog4 


if ô < 1/e. And, therefore, if # is the finite subfield generated by the Ci, 
i = 0,...,k, we have 


H (7) < —P (Co) log P(Co) + ô log k — ô log å 
< —(1 — ô) log(1 — ô) — ô log ê + ô log k 


if ô < 1/2. We can choose ô in such a way that H(@) < e. We then have by 
$.6.7(a) 
H(.4,1T)<H(4V%2,T)=H(B Vv @,T) 


since Æ VB = @V&. By 8.6.7(b) we have HCE V @,T) < H(@,T) 
+H(#€). Therefore H(4,T) < A(.@,T)+ e, for sufficiently small 5. We 
can similarly prove that H(.4,T) < H(4@,T)+ 6, if 6 is small enough. O 


8.6.14 Theorem Kolmogorov—Sinai. If T is an invertible, measure-preser- 
ving transformation on the probability space (Q2, 7 ,P), and if the finite 
subfield Æ is such that the o-field generated by the field Ul" T"_4 is 
equal to ¥, then H(T)= H(.4,T). 


Proor. Let .# be a finite subfield of Z with atoms B,,...,B,. Given 
any 5>0 we can find C; ..., Ck E JS n T” Æ such that P(B; A C;) < ô, 
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i=1,...,k. Let N be such that all the C; are in Uy T" 4. Let Ci = Cj, 
Ci! = Ci — Uz, C;, and denote by # the subfield of ¥ having the C;’ as 


atoms. We have ar \P(B;) — P(C;’)| < k?ô. Given an £ > 0, 8.6.13 allows 
us to choose ô > 0 such that |H(.@, T) — H(&@, T)| < £. And we have 


H(@,7T)<H(@,T)+¢ 


N 
sa (V r.4.T) +E 


i=—N 
=H(4,T)+e by 8.6.12. 


Since this is true for any finite subfield .@ of F, we have H(T)=H 
(46,T), O 


8.6.15 Corollary. If T is an invertible, measure-preserving transformation 
on a probability space (2, Z, P), and Æ 1s a finite subfield of . such that 
the subfields T” Æ, n = 0, +1, +2, ... are independent and the o-field they 
generate is equal to X, we have H(T) = H( Æ). 


8.7 BERNOULLI SHIFTS 
We can now apply the results of 8.6 to compute the entropy of a Bernoulli 
shift. 


8.7.1 Definition (Bernoulli Shift). Let S be the set {0,1,...,k -—l1}, & 
be the o-field of all the subsets of $, and 7 = (po, pi,..., pk-1) be the 
probability measure on ($, F ) such that z(i) = p;, i =0,1,...,k— 1. We 
consider the probability space ({92,,.%,,, Pr) consisting of all doubly infinite 
sequences w = (...,@_1, @o, @1,...) of symbols 0,1, ...,k — 1. The o-field 
Fy 1s generated by the cylinders C = {w: w; =a; -m <i <n}, where 
the a; can be any elements of $. The probability of such a cylinder is 
P(C) = | |;__,, 7(a;). (In other words, the coordinate variables æn (©) = o, 
generate f, and are independent.) 

The Bernoulli shift with distribution 1 is the transformation T,, defined by 


On (TW) = An+1(@). 


(The transformation T , shifts the coordinates of w to the left.) Here, T; is mea- 
sure-preserving and invertible because if C is a cylinder, T~'C and TC are 
also cylinders and have the same P,,-probability as C. 


8.7.2 Theorem (Entropy of a Bernoulli Shift). The Bernoulli shift Ty with 
distribution 7 = (po,..-, pr-1), has entropy H(T,) = — yi pi log pi. 
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Proor. Let Æ be the finite subfield generated by the projection œo. Thus 
z is generated by the cylinders {w: wo = i}, i =0,...,k — 1. The subfield 
T "Æ is generated by the projection a,. Therefore the subfields 77.4, 


n = 0, +1,..., are independent and generate Z, and according to 8.6.15 
k-1 
H(T,) =H(4) = — X pilog pi. 
i=0 


Except for the trivial cases where k = 1, or the distribution 7 gives measure 
one to one of the symbols, the probability spaces (Q,,.%,,P,) are all 
isomorphic to the probability space ([0, 11, #([0, 11°), w) [u is Lebesgue 
measure] and therefore isomorphic to each other. LJ 


We recall that two probability spaces (Q),.41, P1) and (Q2,.4 >, P2) are 
isomorphic iff properties (a), (b), (c) of 8.6.0 are verified. 


8.7.3 Theorem. For any integer k > 1, and any distribution 7 on {0,1,..., 
k — 1} such that p; 4 1,i =0,1,...,k — 1, the probability space (Q,, Fr, Pz) 
is isomorphic to the probability space ((0, 1P, 40, 1F, u) where u is 
Lebesgue measure. 

If k = 1 or if one of the p; is equal to 1, the probability space (Ra, Za, Pa) 
is essentially composed of one element and cannot be isomorphic to the unit 
square. 


Proor. If the distribution 7 gives the same probability 1/k to each of the k 
symbols, the proof is easy. For each point a of the unit square we write the 
expansion in base k of the x and y coordinates of a 


X = w000], y = .0_]0 2's, 
and we define ġa as 
ġa =w = (..., 0-2, W_1, Wo, W1, @2,..-). 


If one of the coordinates of a is of the form n/k” it has two expansions in 
base k (since 1/k” = 3°". (k — 1)/k*), and ġa is not well defined. We have 
to first remove such points from the unit square, but we are still left with a 
Borel set E of measure 1. It is easy to check then that P(E) = 1, and that @ 
is an isomorphism from E to @E. 

If the p; are not all equal, we have to modify the proof to make sure 
that P,,(gA) = w(A) for all Borel subsets of E. We construct the image 
(...,@_1,@0,@1,...)= (a) ofa point a of [0, 1]? in the following way. We 
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cut the square [0, 1]? in k vertical open bands Ap, Ai,..-, Ax—1, with widths 
proportional to po, pi,.-., Pe—1 So that 


Ay = {(x, y): O<x< po}, Ar=4@y): So pjp<x<d op, 


(when the p; are all equal to 1/k, finding the first term of the expansion of x 
in base k requires dividing [0, 1]* in k equal vertical bands). If the point a is 
in A;, we impose wp = i for the Oth coordinate of da. If the point a is on the 
vertical boundaries of one of the A; we do not define wo. 

We then divide the band A; into k vertical open bands with widths 
proportional to po, pi,.-., Pxe—1; if the point a is in the jth band we impose 
@, = j for the first coordinate of ġa; again we do not try to define w, when 
the point a falls on the vertical boundaries of one of those new bands. We 
iterate this procedure to define the values of œ, for n =0,1,.... To define 
the w_, for n = 1, 2,..., we use the same procedure but we divide the square 
[0, 1]° into horizontal open bands. 

Note that ġa is not defined for the points a that fall on the boundary of 
one of the bands so defined. Since there are only a countable number of 
those bands, the function @ is defined on a set E of Lebesgue-measure 1. 
The function ¢ is not onto. The sequences that contain an infinite number of 
consecutive 0’s or an infinite number of consecutive k — 1’s, are never images 
of elements of E. Let A, be the set {w: w =O forall i > n}, and B, be 
the set {w: w; =k— 1 forall i >n}, and define the sets A_, and B_, 
similarly by replacing the condition i > n by i < —n. The set of elements 
of Q, that are not images under @ of elements of E is [F _ (An U Bp). 
Each A, and each B, is in ¥,, and has measure O, therefore 02’ = @E is 
a measurable subset of P,,-measure one. Thus the restriction of @ to E is 
one-to-one. 

By construction, the image of an open rectangle delimited by a vertical 
and a horizontal band constructed above is of the form Q'A {@: œ; = 
h -m <i <n}, and has as P,-measure the Lebesgue measure of the 
rectangle. Since the bands generate the Borel o-field on the unit square, and 
the cylinders generate .¥,, the probability space (2y, Fx, Pr) is isomorphic 
to the unit square with Lebesgue measure. C 


Now that we know that essentially all the probability spaces (Sr, Fr, Pr) 
are isomorphic, we can start wondering whether some of the Bernoulli shifts 
are themselves isomorphic. If two Bernoulli shifts have different entropy, they 
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cannot be isomorphic by 8.6.10. Thus the Bernoulli shift with distribution 
(1/3, 1/3, 1/3) is not isomorphic to the Bernoulli shift with distribution 
(1/4, 1/4, 1/4, 1/4). But what happens if two Bernoulli shifts have the same 
entropy? For example, the Bernoulli shifts with distributions (1/2, 1/8, 1/8, 
1/8, 1/8) and (1/4, 1/4, 1/4, 1/4) both have entropy 2; are they isomorphic? 
The answer is given in the following theorem which we state without 
proof. 


8.7.4 Theorem. If two Bernoulli shifts have the same entropy, they are 
isomorphic in the sense of 8.6.8. 


8.8 REFERENCES 

The interested reader can find elementary proofs of 8.7.4 (too long to include 
in this chapter) in Shields (1973) and in Cornfeld, Fomin, and Sinai (1982). 

Some standard references on ergodic theory are Billingsley (1965), 
Halmos (1956), and Jacobs (1962). The pointwise ergodic theorem can 
be generalized in several ways. If T is a positive contraction operator on 
L', not necessarily arising from a measure-preserving transformation, one 
can investigate convergence of the sequence of averages n—-'(f + Tf +---+ 
T”—! f). In fact the arithmetic average can be replaced by a ratio of the form 


(f+Tf+---+T"'fy/(et+Tg+---+T" 'g), 


where f and g belong to L!. For results of this type (specifically, the 
Dunford—Schwartz ergodic theorem and the Chacon—Ornstein theorem), see 
Garsia (1970). 

A discussion of the Shannon—McMillan theorem for a finite coordinate 
space may be found in Billingsley (1965). McMillan (1953) proved L! con- 
vergence and Breiman (1957) obtained a.e. convergence; Shannon’s original 
paper (1948) considered convergence in probability for functions of a finite 
Markov chain. All these results were for finite coordinate spaces; the extension 
to the countable case is due to Chung (1961). For applications to information 
theory, see Ash (1965) and Gallager (1968). 

The Shannon—McMillan theorem has been generalized to a more abstract 
Setting. For a survey of this area and a unified approach to the various results, 
see Kieffer (1970). 

The definition of the entropy of a transformation as an invariant by isomor- 
phism is due to Kolmogorov (1958; 1959). The notion of Bernoulli shift can 
be generalized by allowing the o-fields Æ to have a countable number of 
atoms. The entropy of such a shift can be infinite. The proof of the isomorphy 
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of Bernoulli shifts with same entropy is due to Ornstein (1970a) for shifts 
with finite entropy, and for generalized Bernoulli shifts with finite or infinite 
entropy (Ornstein, 1970b). 

The Ornstein isomorphism theorem for Bernoulli shifts has been applied by 
Gray (1975) to show the existence of a class of sliding-block noiseless source 
codes for a large class of ergodic sources. 


CHAPTER 


9 


BROWNIAN MOTION AND 
STOCHASTIC INTEGRALS 


9,1 StTocHAsTic PROCESSES 

A stochastic process on a probability space (Q2,.¥, P) is a family of random 
variables (X;),;<7, where the index set T can be any set. If T = N, a stochas- 
tic process is simply a sequence of random variables X, on the probability 
space (92, Z, P), and the indices 0, 1, ... may represent successive times. For 
instance, a gambler plays at times 0,1, 2,...; Xo is the initial fortune of the 
gambler and X,, is the fortune at time n. In this chapter we will consider 
only the case T = R* = [0, 00). The index ¢ can again be thought as denot- 
ing time and, for each fixed œ, the function t —> X,(w) is interpreted as the 
path of œ. We will say that the paths are continuous, or that the process is 
continuous (resp. right-continuous) if the functions £ —> X, (œ) are continuous 
(resp. right-continuous) for each w. 

Stochastic processes are often constructed as mathematical models. For in- 
stance, we may try to build a mathematical model for the number N,(w) of 
phone calls arriving at a switchboard in the interval of time [0, ¢]. Assume that 
the data support the following assumptions about the random variables N,. 

l. No=Oas. 

2. IfO<t, <t) <--- <t the increments N,,—N,,...,N:, —N;,_, are 
independent. 

3. Fors < t the increment N, — N, has a Poisson distribution with parameter 

X(t — s), where i is the average number of calls in one unit of time. 

One can construct the paths of the stochastic process (N;),cr+ (called the 
Poisson process) in the following way. Let Ti < Ta <---<T, <... be the 
successive arrival times of the calls, and W; =T, W> =T.—T),...,W,, 
=T,—T,-1,... the waiting times between calls. Assume that the W, are 
independent, identically distributed, each with exponentional distribution with 
mean 1/i. The process (N;);>0 defined by N; = 5_,, /,7, <n satisfies conditions 
1-3. It follows from the construction that, for almost every œ, the path t > 
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N,(@) is a right-continuous, nondecreasing function that increases only by 
jumps of size 1. These are properties that we expect for the number of calls 
arriving in the interval [0, ¢]. 

We can also use the Kolmogorov extension theorem to construct the Pois- 
son process. LetO =f, < t <... < tj; < tip <> < ta; denote by S the set 
{t;,...,f,} and by Fs the o-field of subsets of Q = RR" generated by the 
coordinates X,,,...,X,,. Let ms be the probability on (2, 7s) which makes 
Xo = 0 a.s., and the X,, — X,,_, independent and Poisson distributed with pa- 
rameters A(t, — t,-,), for k = 2,...,n. If the 2; form a consistent system, 
then we can apply the Kolmogorov extension theorem 2.7.5 and construct a 
probability P on Q for which the X, satisfy conditions 1-3. The problem is 
that we now have to study the paths £ —> X,. Is it true that, outside of a set 
of P-probability 0, the functions £ —> X, are right-continuous, nondecreasing 
and increase only by jumps of size 1? Unfortunately it is probably not true 
(see Problem 3); we have to modify each X, on a set of probability O in such 
a way that the new paths satisfy all these intuitive properties. This will be the 
subject of Problem 7 in 9.2. 

The above discussion leads to the notion of version of a process. Let 
(X;);>9 and (Y;);+0 be two stochastic processes on the same probability space 
(Q2,.¥%,P). The process (Y;);+9 1s a version of the process (X,),>9 iff, for 
each t € Rt, we have P(X, = Y;) = 1. This does not imply the stronger con- 
dition P(X, = Y, Yt e R) = 1 (the set {X, = Y, Vt e R*} is generally not 
even measurable), but it does imply that, for any countable subset J of RY, 
P(X, = Y, Vt € J) = 1. In particular the processes (X;),>9 and (Y ;);>o have the 
same finite-dimensional distributions. The regularity of the paths of a process 
can always be destroyed by the wrong choice of version (see Problem 3). 

In the next section we construct another stochastic process, Brownian mo- 
tion (B,),-a+. The random variables B, will satisfy conditions 1 and 2 above, 
but, for s < t, the distribution of B, — B, will be normal with mean 0 and 
variance ź — s. Since the process is constructed as a model for the movement 
of a particle, we would like its paths to be continuous. Unfortunately there is 
no intuitive construction of Brownian motion that gives us continuous paths; 
we will use the Kolmogorov extension theorem and then modify the process 
to obtain continuous paths. 

We conclude this section with a few easy consequences of path regularity. 
Let (X;);+9 be a stochastic process; since X, is a random variable, for each 
fixed ż, the function œ —> X;(@) is measurable from (Q,.¥ ) to (R, #(R)). 
The functions t > X, (œ) from (Rt, A(R% )) to (R, #(R)) are not necessarily 
measurable. But it is the case if the paths are all regular enough, for example 
if, for each w, the function t > X,(w) is right-continuous. 

The stochastic process (X,);>9 can be considered as a function X of the 
two variables ¢ and œ. The process (X;),>9 is said to be measurable iff X is 
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measurable as a function from (Q x Rt, F x .@(R*)) to (R, #(R)); if the 
process (X;);>9 1s measurable and |X| is not too large, we can consider integrals 
of the form E[ fo X(t, œ) dt] (Fubini’s theorem). Again, right-continuity of the 
paths is sufficient to assure the measurability of X (see Problem 1). 


Problems 


1. Let (X;);>9 be a stochastic process. Show that, if the paths are right- 
continuous (or left-continuous), the process (X;);+9 1s measurable. [Hint: 
if the paths of the process are right-continuous, consider the processes 
X, defined by X,(t,w) = X(K+1)/n,@) if k/n <t < (kK+1)/n, 
k=0,1,....] 

2. Let (X;);>9 and (Y,);>o be two processes on ({2,.~ P). Assume that (X;),> 
and (Y,);>+9 are such that, for each t > 0, P(X; = Y,) = 1, and that they 
both have right-continuous paths. Show that the set {w: X;(@) Æ Y;(w) 
for at least one t} 1s measurable and has probability 0. (The same holds 
if both processes have left-continuous paths.) 


3. Let (X;),>9 be a continuous process. Assume that the given probability 
space has a random variable Z defined on it such that P(Z = a) = Q for 
every real number a. (If necessary enlarge the probability space.) Show 
that there exists a version (Y;);+9 of (X;);>9 such that the paths of (Y;):>0 
are nowhere continuous. [Hint: let f(t) be the function that is 1 on the 
rationals and O elsewhere. Consider the process Y, = X,+ f(Z+1).] 


9.2 BROwNIAN MOTION 
9.2.1 Definition. A Brownian motion is a stochastic process (B,),<a+ with 
the following properties. 


1. The increments on disjoint intervals are independent: if O < ft, < fp 
<--- < fn, the random variables B, — B,,,..., B —B,,_, are independent. 


2. If s <t, the increment B, — B, of the process on the interval (s, ¢] is 
normally distributed with mean 0 and variance t — s. 


3. The process starts a.s. at 0: Bo = O with probability one. 


The paths of the process B, are all continuous. 


9.2.2 Remarks. 1. The concept of Brownian motion goes back to the obser- 
vation by the botanist Brown of the random movement of particles of pollen in 
water. Einstein and Smoluchovski showed that a good approximation for the 
projection of such a random movement on a line was given by the conditions 
in Definition 9.2.1, where the variance of B, — B, is only assumed to be pro- 
portional to ¢ — s. The coefficient of proportionality depends on the fluid. A 
rigorous proof of the existence of a continuous version was given by Wiener. 
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For this reason the Brownian motion is often called the Wiener process and 
denoted by W,. 

2. The probability space (Q, .¥, P) is not specified, and we are free to use 
whatever seems appropriate. 

3. Conditions 1, 2, and 3 involve only the finite-dimensional distributions 
of the process. Condition 3 is mild. [If (X;);>o is a process satisfying conditions 
1, 2, 4, the process Y, = X, — Xp 1s a Brownian motion.] The existence of 
a process satisfying 1, 2, 3 will be an easy consequence of the Kolmogorov 
extension theorem. It will be much harder to construct a process satisfying 
condition 4 as well. 

4. Condition 4 is sometimes weakened to “outside of a set of probability 
O the paths of (B;),+9 are continuous.” Probabilistically, it does not matter 
whether we work with the strong or the weak condition since we can always 
restrict ourselves to the set of probability 1 on which the paths are well 
behaved. 


9.2.3 Theorem. There exists a probability P on the measurable space 
(RB, BRP’) such that, for this probability, the coordinate functions X, 
satisfy properties 1, 2, 3 of Definition 9.2.1. 


Proor. Let O= ft, < t<- <fi < ti <--:<t,; denote by S the set 
{t1,...,tn} and by Fs the o-field of subsets of Q = RÈ" generated by 
the coordinates X,,,...,X;,. Let ms be the probability on (Q, Fs) which 
makes Xo = 0 a.s. and the X,;, — X;,_, independent and normally distributed 
with mean O and variances t; —t,_1, for k =2,...,n. If the xs form a 
consistent system we can apply the Kolmogorov extension theorem 2.7.5, and 
the existence of probability P is proven. To verify the consistency of the zs, 
we have to show that, if S; denotes the subset of S obtained by deleting 
the element ¢; for one i > 1, the restriction of 7s to the o-field Fs, is 7s.. 
We know that for mas the random variables X, — X;,,...,X;:, —Xz_,, 
Xna —Xt,5-..,Xp, — Xn are independent and normally distributed with 
mean 0 and variances tz — t),...,¢; — tj-1, tj4.1 — fis -- -fn — tn-1. We want 
to show that for 7s the random variables X, — X,,,...,X,, — Xt_)5-- +> 
Xn — X;,_, are independent and normally distributed with mean O and 
variances fs —¢,,.-.,tj41 —tj-1,-...,t, —t,_1. This is true since the random 
variable X,,,—X, _, is the sum of the two independent and normally 
distributed random variables X, — X,_, and X, — X,- U 


9.2.4 Remark. If P is the probability constructed in 9.2.3, then, by Cheby- 
shev’s inequality, P(|X, — X,| > £) < (t —s)/e”. Therefore X, converges in 
probability to X, as t > s. But convergence in probability is not enough; we 
want pointwise convergence. Assume the existence of a continuous version 
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(Y;):>9 of the process (X;),>9. Even if (X;),>9 has discontinuous paths, there 
will be traces of continuity left on the countable subsets of R*. If J is such 
a countable subset, then the process (X;),;>9 will be almost surely continuous 
on J, since the set {X, = Y, Wt € J} has probability one. Therefore our first 
step is to study the continuity of (X;),;9 on a countable dense subset of RT. 
We start with a lemma. 


9.2.5 Lemma. LetO= tọ <--- < t„. If the process (X;),;9 satisfies proper- 
ties 1-3 of Definition 9.2.1, we have, for a > 0, 


P (, max Xi, > a) < 2P(X;, > a) 


P ( max Xal > a) < 2P(|X, | > a). 


Proor. Let T be defined on {max;=o ., X;, > a} as the first time f; such that 
X, > a. On the set {max;=o,.., Xn < a}, we take T = oo. We have, using the 


fact that fori =0,...,n — 1, the X,, — X, have a symmetric distribution, 


kaata 


n—| 

=2 PT = t)P Xn —X,, > 0) +P(T = tn). 
i=0 

Since the increment X,, — X, is independent of the o-field o(X,,, ..., X;,), 
which contains the set {T = t;}, we obtain 
n=l 
P ( max, Xa > a) = 22 Pr = ti, Xn — X, > 0)+P(T =t,) 

n—| 

<2) PIT =ti, X, >a) + PT =t,, X, > a) 
i=0 


< 2P(X, >a). 


For the second inequality it is sufficient to notice that the process (—X;)>0 
also satisfies conditions 1-3 of 9.2.1, and that 


P( min Xn < -a) =P( max —X; > a) < 2P(-X,, >a) 


rrera 


pr». 


= 2P(X,, < —a). O 
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9.2.6 Corollary. If the process (X;),+9 satisfies conditions 1-3 of 9.2.1, 
and J is a countable subset of [0, N], then, for a > 0, P(sup,., |X;| > a) 
< 2P(|Xwn| > a). 


9.2.7 Theorem. Let (X;);>9 be the process constructed in 9.2.3, For every 
N > Q the restriction of the process (X;),>9 to the dyadic rationals k/2” is a.s. 
uniformly continuous on [0, N]. 


Proor. By rescaling the real line we can always assume that N = 1. Let S 
be the set of all dyadic rationals in [0, 1]. We have to show that 


Y,(@) = sup IX (w) —X,;(@)| > 0 a.s. 


t,seS,|r—s|<1/2" 


and, since Y, > Y,41, it is sufficient (by (2.5.3)) to show that, for all 
a>O,P(Y, >a)— 0. 

The random variables Y, are hard to work with and the trick is to bound 
Y, by 3 max; Zn.k, where 


Znk = sup IX; — Xx 2" | k=0,...,2" —1 
re SK / 2", (kK+1) /27] 


and to use 9,2.6 to show that P(max; Z, , > a) > Q0 as n — œ. As the ran- 
dom variables Zą„ k, k =0,...,2” — 1 are identically distributed, 


2” —] 
P(max Zn, > a) = P (Liza > a < X P(Zn x > a) = 2" P(Zn o > a). 
k k=0 


Now by 9.2.6, 


P(Z,0>a)=P ( sup = |X;| > a) < 2P(|X1/2| > a), 
reSO[O, 1/27 ] 
so that 
P (max. > a) < 2”! PX 12" | > a). 


This last quantity converges to 0 as n — oo since Xj; has a normal distri- 
bution with mean 0 and variance 1/2” (see 9.2.8 below). C 


9.2.8 Lemma. Leta > 0, B > 0 and assume that Xg is a normally distributed 
random variable with mean 0 and variance $. Then 


l 
g Xgl >a)—> 0 as B > 0. 
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PROOE. 


1 2 o x 2 >o 2 
p Xgl > a) PN 2p J ° i B./1 aS 


a 


2 8 2 
_* -7 dy = V8 | 
Spal im 7 Bn 


if B is small enough to assure that a/y 28 > 1. L 


We are now ready to construct a continuous version of the process (X; );>0- 
9.2.9 Theorem. There exists a process satisfying the conditions of 9.2.1. 


Proor. With Y, defined as in the proof of 9.2.7, let A be the set {Y, — 9}. 
The set A is measurable and has probability 1. On A we define, for all ż, 
B,(w) as the limit of the random variables X,(w) as s approaches £ along 
the dyadic rationals. On the complement of A we take B,(@) = 0 for all ż. 
By 9.2.7, the process (B;),;+9 is continuous, and we just have to prove that 
(B;);>0 is a version of (X;);>9. Let t > 0, and (s,) be a sequence of dyadic 
rationals converging to t. The random variables X, converge a.s. to B; and in 
probability to X; (9.2.4), which is enough to assure that B, = X, a.s. L 


9.2.10 Corollary. If (B;);0 is a Brownian motion, then the functions 
sup, < Br and sup, <n |B,| are measurable and satisfy, for a > O, the following 
inequalities: 


P(sup B, > a) < 2P(B;, > a) and 


t=to 


P(sup |B,| > a) < 2P(|B;,| > a). 


ESEO 


Proof. The continuity of the paths of (B;),>9 allows us to generalize 9.2.6 
to Rt. O 


Problems 


l. Let (X;),>9 be a continuous process. Show that (X;),>9 is a Brownian 
motion iff, for any finite subset {f;,...,tf,} of R*, the random vector 
(X,,,...,X;,) 18 Gaussian with mean 0 and covariance a; ; = min(f;, t;). 
(See Appendix 5 for the definition and properties of a Gaussian random 
vector.) 


2. A Stochastic process (X;);+9 is continuous in probability iff, for each 
s € R}, X; > X, in probability as £ > s. Show that, if two processes 
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(X;);>9 and (Y,),>9 have the same finite-dimensional distributions, (X,),>0 
is continuous in probability iff (Y,);>o 1s. 


3. Let (B;),>9 be a Brownian motion and c > 0. Show that the process 
l 
XxX; — —B.2; 
c 


is a Brownian motion. 
4. Let (B;)>o be a Brownian motion. Show that the process 


tBi,, ift>0 
X, = fts 4 
l 0, ift=0, 


satisfies conditions 1—3 of Definition 9.2.1, and is continuous outside of 
a set of probability 0. The process (X, )>o is therefore a Brownian motion 
and the properties of the Brownian motion at 0 and œœ are related. [Hint: 
use Problem 1 to show that the process (X,);>o satisfies conditions 1—3 of 
Definition 9.2.1, and then Theorem 9.2.7 to show the a.s. continuity at 0.] 


5. Let (B;);>9 be a Brownian motion and (X;),;+9 be the process defined as 
X; = B, —tbB,, 0<ż<1. 
The process (X; )>o 1s called a Brownian Bridge. 
(a) Compute the means E(X,) and the covariances E(X,X,). What is the 
distribution of X,? 
(b) Show that the process 
Y, = Xis, O<t<1, 
has the same finite-dimensional distributions as the process (X;)o<;<1- 


(c) Let Ui,..., Un be independent random variables with uniform dis- 
tributions on (0, 1) and let 


l fi 
F(t, Ww) — n 2 Tuos 0 <f< l. 
i= 


[Fa (t, @) is an empirical distribution for the uniform distribution on 
(0, 1).] Show that the process 


Xn(t,@) = J/n(F p(t, œ) — t), 0<t<1, 


has the same means and covariances as the Brownian Bridge. 
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(d) Show that, for each t, the random variables X,,(t) converge in dis- 
tribution to the random variable X;. 


6. Let (B;),;>9 be a Brownian motion. Show that B,/t > 0 a.s. as £ > oo as 
follows. 
(a) Let X,, X>,... be identically distributed random variables. Show that 
if E|X\| < oo, then for all a > 0, P(|X,| > an infinitely often) = 0. 
(b) Show that B,/n — O a.s. as the integer n > oo. 
(c) Let 
Xn = max |B, — B,l, 
n<t<n+l] 
show that E|X,| < oo. 
(d) Conclude by noticing that for n <t<n+]1 
B, Bq 


IBn| Xn 
< > + —— 


Í n) n no 

(e) Show that we could have used Problem 4 to prove directly that 
B,/t > Q a.s. as t > œ. 

[Hint: use the equality E(X) = fy” P(X > i)dd for nonnegative random 


variables X, the strong law of large numbers and 9.2.10.] 


7. The purpose of this exercise is to construct the Poisson process using the 

Kolmogorov extension theorem. 

(a) Use the Kolmogorov Extension Theorem to construct a probability 
P on the measurable space (RE, BRE ), such that, for this prob- 
ability, the coordinate functions X, satisfy properties i, 11, 111 below: 

i. Xo—Oas. 
ii. IfO<t); <t2 <---<t,, the increments X, — X;,,...,X:, — 
X;,_, are independent. 
ili. Fors < t, the increment X, — X, has a Poisson distribution with 
parameter A(t — s). 
(b) Let S be the set of dyadic rationals. Show that outside of a set of 
probability 0 we have 
i. Xo =O. 
ii. The restrictions to S of the paths £ —> X;(q@) are nondecreasing, 
and take only nonnegative integer values. 

(c) Modify the process (X,);>o to obtain a right-continuous, integer val- 
ued process (N;),>9 satisfying the properties in question (a) and 
whose paths are nondecreasing. 

(d) Show that, outside of a set of probability 0, the paths of (N;),>0 
increase only by jumps of size 1. 

[Hint: Let T > 0 be fixed. To prove (d), examine the following limit: 

lim, P(SUP jc 7 IN gyz — N ijz] > 2).] 
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9,3 NowHERE DIFFERENTIABILITY AND QUADRATIC VARIATION OF PATHS 
The paths of the Brownian motion are continuous but they are nowhere 
differentiable. 


9.3.1 Theorem. Let (B;),59 be a Brownian motion on a probability space 
(Q,.4%,P). Then the paths of (B,),9 are a.s. nowhere differentiable, 


As stated above, the theorem is slightly misleading, The set {w: t > B, (œw) 
is differentiable at least at one point} is not necessarily measurable, but we 
will show that it is included in a measurable set of probability 0. 


PROOF. 

We start by studying what differentiability at one point implies for a func- 
tion. Let f(t) be a real-valued function on [0, c0); assume that it has a 
derivative f'(to) at a point to and choose a real number a > O such that 
|f (to)| < a. Then there exists a positive integer no such that, for n > no, 


| f(t) — f o)l < 2alt — tol, if |t — fo] < 3/n. (1) 


For n > no let k be the integer such that (k — 1)/n < to < k/n. The points 
(k —1)/n, k/n, (k+ 1)/n, (k +2)/n are within distance 3/n of to; applying 
(1) and the triangle inequality we obtain, for example, 


J (=) -s (H) = s (<=) — Fito) + | to) - (=) 


k+2 k+1 10a 
—— —fp| = —. 
n 
Using the same method, we obtain similar inequalities for | f(k + 1/n) 
— f(k/n)| and | f(k/n) — f(k — 1/n)|, which gives us 


k+2 k+1 K+] k 
mn (FC DE) sa) 
n n n 
k-—1 10a 
eG) Sr) ear 
n n n 
Now we apply all this to the Brownian motion. Denote by Xx the random 


variable 
k+l k 
B(——)-3(7) 
n n 


k+2 k+1 
som (P(E) (H) 
n n 
k k— 1 
s(a) -8())) 
n n 
Let A be the set of w’s such that, somewhere on [0, 1), the function 
t — B,(w) has a derivative bounded in absolute value by a. According to the 


< 2a + 2a 


7 + 
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preceding discussion, any w in A 1s, for n big enough, in A, where A, is the set 
of w’s such that for at least one k in {1,2...,n} we have X: (œ) < (10a)/n. 
If we show that P(lim inf, An) = 0, we will have proved that, on [0, 1), the 
paths have a.s. no derivative bounded by a in absolute value. 

Since the increments of the Brownian motion on the intervals (i/n, 
[@ +1)/n] are independent and identically distributed, we have 


P(A,) < STP (x< 2a) 


k=l 


OROORO] 
(De) 

vpe] 

apee 


3 
eo /2n dx — 0. 


= fi —— 
a Ja 
We have been studying differentiability on the interval [0, 1) with the bound 


a, but the same technique works on [0, N) with bounds b as large as desired. 
The theorem is proved. L 


9.3.2 Remark. Theorem 9.3.1 can be generalized to upper, lower, right, and 
left derivatives: its proof can be modified to show that, outside of a set of 
probability 0, lim inf,_,,<5(B, — Bs)/(t — s), lim sup, ,,,.,(B; — B;)/(tf — s), 
lim inf;,—s >s (B: —Bs)/(t — s) and lim sup, ,, ,.,(B; — Bs)/(t — s) are nowhere 
finite. 


9.3.3 Corollary. Almost every path of the Brownian motion has infinite vari- 
ation on every finite interval. 


Proof. By 2.3.9, if a path has finite variation on an interval [a, b], then it is 
almost everywhere differentiable on the interval. C 


If [a, b] is an interval of R*, and Y = {to, ..., tg} is any partition of [a, b], 
we have just shown that sup, 5), |B(t;,1) — B(t;)| = oo. What can we say 
about the quadratic variation lim, >|, |B(ti41) — B(t;)|? when the partitions 
P get finer? 
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9.3.4 Theorem. If (B;);>9 is a Brownian motion, [a, b] is an interval of R+, 
and Z, = u, aa, i") is a sequence of partitions of [a, b], then 


2 
D [Bam -Be 


PROOF. 
Let S, =D, Bee) ~ Bt”) 


2 
E(S, ~ (b — a)? = 203 (BERD - BEM)? — EtA - op 


(n) (n) 


— (b—a) inl’ as 


2 
- we have 


SED (Bek) — BEP)? — aA t) 


(BERD — BEY)? — ER, =] 
2 
7 =E) ( Bt; ) ~ BEP) — — (E — o) , 


since the (B G n} y) — B amy — (1 — t™) are independent random variables 


with mean 0. The random variables B(t; m) — BE) are normally distributed 


with variance ir — p. therefore, 1f Z denotes a normally distributed random 


variable with mean 0 and variance 1, 


E(S, — (b - a)? = EZ? -1P So — 4 


< E(Z* — 1} (b — a) max |t® — t| > 0. C 


Problems 


1. Show that, if the Brownian motion has a chord of slope greater than a 
on the interval [0, 1], then the process (X;)>o defined in Problem 3 of 
9.2 has, on [0,1/c*], a chord with slope greater than ca. This shows 
intuitively why the Brownian motion is a.s. nowhere differentiable. 


2. Let Z, and S, be defined as in 9.3.4, and let |2, || = max; KD — aa 


Show that, if for a sequence of partitions F,, >>, ||F,|| < oo, then 
Sn > (b-—a)as. 


9.4 LAW OF THE ITERATED LOGARITHM 
If (B;);>9 is a Brownian motion, we know that B,/t —> Q a.s. as £ > oo 
(Problem 6 of 9.2). In 9.4.4, we show that, more precisely, B; stays asymptot- 


ically in between —./2t log(log t) and ./2t log(log t) as t — oo. According to 
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Problem 4 of 9.2, the behaviors of B, at O and œ are related, and it is not sur- 
prising that a similar asymptotic property is satisfied at 0 (see Theorem 9.4.3). 
The following estimates will be useful in the proof of 9.4.3. 


9.4.1 Lemma. Let X, be a normally distributed random variable with mean 
O and variance f, and let a be strictly positive. Then 


f 
P(X, > a) < Mt apr 


Proor. Just notice that 


P(X, > a) = J e7% 14 dy 
i 
< xe * */ 28 dx 
— aN nt —— | 


Jt ea [2 
ay 27t 
9.4.2 Lemma. The ratio of f% e~*’/2 ds and (e~*/2)/x converges to 1 as 


x converges to oo. (Thus the inequality of 9.4.1 is an asymptotic equality as 
a —> œ.) 


- L 


Proor. Apply L’Hôpital’s Rule. LI 


9.4.3 Theorem. If B, is a Brownian motion, then we have 


B 
lim sup ——— = = 1 a.s. 
+0 s/2tlog(log 1/t) 
and 
B; 


a.s. 


lim inf —S———_ = — 
tO. /2t log(log 1 /t) 


Proof. The second statement of the lemma is a consequence of the first 
because —B, is also a Brownian motion. 


1. Let u(t) = ./2t logdog 1/t). We first show that 


. B, 
lim sup 


—— > < 
20 s/2tlog(log 1/t) — 


This is equivalent to: for almost every œw, and any £ > 0, B, < (1+ €)u(t) 
when ź is near enough to Q. 

Given £ > 0, choose @ € (0,1) such that a(1 + £? > 1: consider the de- 
creasing sequence t, = œ”, and let A, be the set 


a.S. 


A, = {æ: B,(w) > A + uft) for at least one t € (t,41, ty |}. 
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Using 9.2.10 and the fact that the function u(t) is increasing for sufficiently 
small £ we obtain 


P(An) < P(supB, > (A + €)u(tn41)) 


f=th 


<2P(B,, > (1+ £)u(tns1)). 


Therefore, by 9.4.1, 


P(A,) < 2 Vin ttn tn 
E ris (l + €)u(tnat1) 


Substituting œ” for t, we obtain the inequality 


2 I 1 
P(A,) < eml o fa +n 4] 


2(1 + €)7a log o + 1)log J C + 1) log x 


] 
< K— 
T (n+ DÊvlog(n +1) 


where 8 = (1 +€) a > 1. Therefore S~, P(A, ) < 00 and, by the first Borel 
Cantelli Lemma 2.2.4, P(lim sup, An) = 0. This assures that, for almost every 
w, B; < (1 + jut) when f is near enough to 0. 


2. We now show that lim sup, „o (B,/./2tlog(og 1/t)) > 1 a.s. by showing 
that, for any ô > OQ, there exists a sequence f, > f2 > ... decreasing to O such 


that P(B,, > (1 — d)u(t,) 1.0.) = 1. (Recall that the abbreviation 1.0. stands 
for infinitely often.) 

We now choose £> 0 and a@ € (0,1) small enough to guarantee that 
(1 — £} < (1 — æ), and (1—¢)—-(1+8)//a > 1 — ô. Let t = &” and con- 
sider the independent random variables X, = B,, — B, _,. We want to estimate 


S Pn > (1 —e)u(tn)). 


nti" 


Xn (1 — ae | 
> 2 


P(X, > (1 — €)u(t,)) = P (== Vin — tne 


1 
(1 — €),/2 log | n log 5) 


Ín — n+l /l—a 
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Therefore, applying Lemma 9.4.2, we get, since X, /./tn — tn—, has a normal 
distribution with mean O and variance 1, 


K 
P(X, > (1 — €)ut,)) ~ n’ Jlogn’ 


where y = (1 — e/a — a) < 1. This implies that 5°, P(X, > (1 — €)u(tn)) 
= oo, and, by the second Borel Cantelli Lemma 6.1.5, we have 


P(X, > (1 — é)u(t,) 1.0.) = 1. 
If we apply part 1 of the proof to the process —B,, we see that 
P(B, => —(1 + €)u(tn41) for n large enough) = 1, 
and, therefore, 


U(tn+1) . 
P(B, > ult) | — £)— (1 + ¢€)——_ | 10.)= 1. 
U(ty ) 
Since u(tn+1)/U(tn) > a we obtain 
P(B,, > (1 — 4)u(t,) 1.0.) = 1. O 
We give now the result at oo. We could prove the theorem directly, but it is 
easier to use Problem 4 of 9.2. If B, is a Brownian motion, so is the process 


Y, = 1tB,,;, and the properties of the paths of the Brownian motion at O and 
oo are related. 


9.4.4 Theorem. If B, is a Brownian motion, then we have 


i 


lim sup -= = 1 a.s. 
tooo «/2tlog(logtf) 
and z 
lim inf ————~—_— = —] a.s. 


ti>oo 4/2t log(log t) 


Proof. Apply Theorem 9.4.3 to the process Y„= uBi;u, and take 
t = 1/u. U 


Problem 


1. Let S, = `z- Yk, where the Y, are independent and each has a normal 
distribution with mean O and variance o*. Show that 


. Sn 
lim sup 


a o] 
n=>œ  /20°n log(log n) 


a.s. 
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and s 
lim inf ———— 7” = -1 a.s. 


n>œ ./207n log(log n) 


9,5 THe Markov PROPERTY 

The Markov property discussed in Chapter 4 can be expressed as follows: 
a process (X;);+09 satisfies the Markov property if the position of X;, knowing 
what happened up to time s, s < ¢, depends on the value of X, and not on 
the values of X,, u < s. The Markov property of the Brownian motion is a 
corollary of the definition of the Brownian motion. 


9.5.1 Theorem(Markoy Property). Let (B;),> 9 be a Brownian motion. Then 
if s > 0, the process Y, = B;,; — Bs is a Brownian motion independent of 
the o-field .#, = o(B,, u < s). 

We would like to generalize this property by allowing s to be a stopping 
time. We give the definition of stopping times with respect to a general non- 
decreasing family of o-fields since it will be needed in the problems, but, in 
this section, we are only interested in the family (.7,),>0. 


9.5.2 Definition. Let (822, .4%, P) be a probability space and (.¥;),>9 be a non- 
decreasing family of sub-o-fields of . A nonnegative random variable T is 
a stopping time for the o-fields ¢¥;),+9 iff, for each t > 0, {T < t} €.¥;. (The 
random variable T is allowed to take the value +o.) 

The o-field .4; of events prior to T 1s 


Fr=[ AEF: ANT <t)eF, Wt > 0}. 
It is also useful to consider the o-field 
Fr, ={AEF: AN{T <t} E F Yt > O}. 


These definitions are consistent with the definitions given in Chapter 6, since 
for the index set N, the conditions {T = n} € .¥, for any n, and {T < n} E€ In 
for any n, are equivalent. When the index set is Rt, the two conditions are 
no longer equivalent, and the condition {T = t} € %; for any t is too weak to 
allow us to work with uncountable unions such as {T < t} = U..,{T = s}. 

Constant times T = ¢ are stopping times. In that case, Fy = F; and Fra 
— Pi — s>: Fs ` 

The following theorem gives some intuitive properties of stopping times 
and their associated o-fields. 


9.5.3 Theorem. Let (Q,.¥, P) be a probability space, (7;),;>9 a nondecreas- 
ing family of sub-o-fields of .¥, and S and T be two stopping times for the 
(a) We have the inclusion Fr C Fr. 
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(b) The random variable T is .4%7-measurable. 

(c) If S <T, then Ys C Fr and Fs, C Fr. 

(d) If S <T, then Fs} C IT. 

(e) If U is an ¥7-measurable random variable and T < U, then U is a 
stopping time. 

(f) If S and T are two stopping times, then $ v T and $ AT are stopping 
times and Ferr = Fs N Fr. 

(g) If S and T are two stopping times, then $ + T is also a stopping time. 


PROOF. 

(a) Let Acr. Then for any t>0 the set AN{T <t} = 
LU, (AN {T <t- (1/n)} is in ¥;. 

(b) Itis sufficient to show that, for any ¢t > 0, the set {T < t} is in y. For 
any s > 0, we have 


IT <t}O{T <s} ={T <tas} E Z. 


(c) Assume that A € Fs. Then AN{T <t} =AN{S <t} N{T < t) €F, 
and A € Fr. Similarly for the o-fields s, and Fr. 

(d) Use the equality AN {T <t}=U, AN{S <t-—(d/n)} N{T < t}) to 
conclude. 

(e) Use the equality {U < t} = {U < t} A {T < t} to conclude. 

(Ð To show that S vT is a stopping time, use the equality {$ V T < t} = 
{S< H O{T < t}. Similarly {S AT < t} = {S$ <t} U {T < t}. Using (b) 
we see that ¥5,7 C Fs ON Fr. Conversely, if A € Fs OFr, we have 
AN{SAT <t = (AA{S<tHU ANT <t}) EZ. 

(g) Since § is As-measurable, the sets {S E€A}N{S <t} and {S € A} 
A{S <t}O{T <t} are in F; for any set A € (R); which means 
that the restriction of $ to the set {$ < t} O {T < t} is .%,-measurable. A 
similar property holds for T. Therefore the restriction of $ + T to the set 
{S<t}O{T <t} is #%,-measurable, and the set {S+T7 <t} 
=—{§S+7<fN{S <t}O{T < t} isin A. O 


9.5.4 Remark. It is often assumed in the general theory of stochastic pro- 
cesses that the family of sub-o-fields is right-continuous (F; = [),.,.%; for 
any ¢ > 0); in this case the property {T < t} €.¥, for any ¢ is equivalent to 
{T < t} € F for any ¢t, and the o-fields y and Sr, are the same. The family 
of natural o-fields (#,):>o of the Brownian motion is not right-continuous, 
but we will see in Problem 2 that #,, and .@, differ only by measurable sets 
of probability 0. 

The following lemma is useful to verify that hitting times are stopping 
times. 
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9.5.5 Lemma. Let g(t) be a continuous function on [0, co) such that ¢(0) = 
0. Let a > O and T = inf{t > 0: g(t) = a} = inf{t > 0: g(t) > a}, (we take 
T = œ if there is no such ¢). Then T < ¢ iff sup,<, reo g(r) = a. (A similar 
result holds for a < 0.) 


Proor. As the function g is continuous, sup,<, reg g(r) = sup,<,g(s) and 
e(T) =a. If T < t, then sup,., g(s) > g(T) > a. Conversely if sup,., g(s) >a 
there exists a point fo in [0, +] such that g(to) = sup,-,g(s) >a (since g is 
continuous), and therefore T < t. O E 


9.5.6 Example. Let (B;);+9 be a Brownian motion and a Æ Q; the first hit- 
ting time of a is defined as T = inf{t > 0: B, = a}. The Law of the Iterated 
Logarithm assures that the time T is almost surely finite, and the continu- 
ity of the Brownian motion assures that By = a on {T < œo}. If a > 0, then 
(T < t} = {SUP <i reo Br = a} € H, which shows that T is a stopping time 
for the family (.%;),>0. (A similar proof gives the result for a < 0.) 

We want to generalize the Markov property to stopping times, but we first 
give a result on the measurability of Br. 


9.5.7 Theorem. Let (B;):»+o be a Brownian motion and T be a finite stopping 
time for the o-fields 4, = o(B;, s < t), t > 0, then Br is #r-measurable. 


Proor. Define X, = By;n on {k/n <T < (k+ 1)/n}, k =0,1,2,.... Then 
X, is .47-measurable, and By = lim, X,. CO 


9.5.8 (Strong Markov Property). Let (B,),>9 be a Brownian motion and T be 
a finite stopping time for the family of o-fields (@;),>9. Then X, = B,,7 — Br 
is a Brownian motion independent of the o-field #7. 


Proor. Consider the times T, defined as follows: 


T,=“ io  <TO <T <~,k=1,2,... | 
n n n 
The T, are stopping times (9,5.3(e)) and, if C € @7, the sets C N {Ta = k/n} 
are in kın. Applying 9.5.1, we get, for A € @(R), 


OO 
P({Bis7, — Br, €A}OC) = X P ({Brsen — Bum EA, Tn =£} NC) 
k=l 


COO 
=S>P (Basan — Byn €A)P((Tn =£} 0) 
k=l 


= P(B, € A)P(C). 
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We now restrict the sets A to be open intervals (a,b). When n > œ, 
the random variables B,,7,—B;, converge a.s. to B,,,;— Br; there- 
fore P({a < Bar — Br < b} NC) < liminf P({a < Br, — Br, < b} NC) 
< limsup P({a < Bar, — Br, < b} OC) < P({la < Bar — Br < b} NC). 
Letting n approach infinity, we get, for a and b such that P(B,,7; — Br = a) 
= P(B r — Br = b) = 0, 


P({a < Baz — Br < b} N C) = P(a < B, < b)P(C). (1) 


For any € > 0, we have, since we have already verified the Strong Markov 
Property for the stopping times T,, 


= P(a-—e<B,<at+eé). 


Let £ — 0 to obtain P(B,,7 — Br = a) = 0, for any a > 0. Therefore equality 
(1) is satisfied for all a and b € R, and B,,7 — By is a Brownian motion 
independent of #r. C 


9.5.9 Remark. The same proof shows that, more generally, if T is a finite 
stopping time for the family of o-fields .@;,, then Y; = B;+r — By is a Brow- 
nian motion independent of the o-field #7: the stopping times 7,, defined 
as above are still stopping times for the family (#;),>9 and, as T < Ta, we 
have Br} C Br, Therefore the above proof is still valid if the set C is in 
Bra- 

In particular, if T is a finite stopping time for the family (.#;),>0, then 
Y, = B,,7 — By is a Brownian motion independent of the o-field r4. 

We now apply the strong Markov property to the problem of finding the 
distribution of the random variable sup,.,B;. The idea is the following: let 
a> 0, and let T denote the first hitting time of a by the Brownian mo- 
tion. Since the process Y, = B,,7 — By is a Brownian motion, we have, for 
any u > 0, P(Y, > 0) = P(Y, < 0) = 1/2. This property should still hold if 
we replace u by a time independent of the process (Y,),>09, for example 
t—T, and we should have P(sup,« Bs >a)= P(T <t)=2P(T <t, 
Y,.7 > 0) = 2P(B, > a). We now give a detailed proof. 


9.5.10 Theorem, Let (B;),.9 be a Brownian motion and a > 0. Then 


P(sup B, > a) = 2P(B; > a). 


Sxi 
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Proor. Let T be the first hitting time of a and define the stopping times T, 
as in 9.5.8. Since the time T, is independent of B,,7, — By,, we have 


k 
P(B, — Br, > 0,T, <) = JDP (B, — Bip >0,T, = — <1] 
k 


k 
= X P(B, — Bun > 0)P (r, =- < r) 
n 


k<tn 


= P(T, < t). 


Since we let n approach oo; as P(B, — Br = 0) = P(B, =a) =0 and 
P(T =t) < P(B; =a), an argument similar to the one given in 9.5.8 
shows that 


P(B, > a) = P(B, — Br > 0,T < t) = P(T < t) = {P(supB, > a). O 


S&f 


Problems 


1. Let (B;);>9 be a Brownian motion, s < t, A € @(R) and B, =o(B,,u<s). 
Show that 


P(B; €A | @s) = P(B, €A | Bs) = ps(Bs, A) as, 


where p,(x,A) = f (1 /V2rtu)exp-—(y — x)?/2u dy. 
(Hint: use the conditional density of B; — B, given B,.) 
2, In this problem we will show that .4, and .@;, differ only by sets of 
probability 0. | 
(a) If f is a measurable, bounded function from R into R define for 
u >Q, 


B l (y— x)? 
pals, f= | f= exp a 


Show that the function (u, x) —> p,(x, f) is continuous. 
(b) Use Problem 1 to show that if s < t and f is a measurable, bounded 
function from R into R, then 


Elf (B) | @s] = pi-s(Bs, f) as. 


(c) Show that if s < ¢ and f is a measurable, bounded function from R 
into R, then 


EL f(Bi) | 254] 5 pr-s(Bs, f) a.s. 
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(d) Show that if the functions f1, fo,..., fm are measurable and bound- 
ed and if s < f <t <--- < tm, then 


E ||| fi(B,) | Z| =E 


i=l 


J| fi, | a, a.s. 


i=l 


(e) Show that the above equality is still true when the ¢; are not restricted 
to being larger than s. 

(f) Now show that, if the random variable Z is bounded and 
o(B,,, u > 0)-measurable, then 


EIZ | Bs41 = EIZ | Bl. 


This shows that a .#,,-measurable random variable is a.s. equal to 
a .%.-measurable random variable. 

(g) Let P be the smallest complete o-field generated by the random 
variables B,, t > 0. Define .“= {A € .@: P(A) = 0} and let .%,' be 
the o-field generated by #, and.“ Show that the family of o-fields 
(B )>0 is right-continuous. 


(Hint: use 6.4.4 for (c) and (g).) 


3. Let (Q2,.¥ P) be a probability space and (.¥; ),>9 be a nondecreasing family 
of sub-o-fields of A Let F, = Fip. 
(a) Show that a nonnegative random variable T is a stopping time for 
the family (F, )>0 iff {T < t} € Fi, for any ¢ > 0. 
(b) Show that, if T is a stopping time for the family (¥%;),9, we have 
Fry = GT. 
4. Let(Q, 4 P) be a probability space and (.¥;),.9 be a nondecreasing family 
of sub-o-fields of Z 
(a) Show that, if T and S are two stopping times for the family (¥,),>0, 
the sets {S < T}, {S < T}, {S = T}, {S > T}, and {S > T} are all in 
FS NFT. 
(b) Show that, if the set A is in .%s, then the set AN {S < T} is in %7. 
(c) Show that, if the set A is in s4, then the set AN {S < T} is in %7. 
(Hint: for (a) it suffices to prove that ($ < T) and (S < T) are in %7. Use 
arguments similar to those of Theorem 9.5.3(f).) 
The above results generalize properties such as “if S$ is a stopping 
time, the sets {S < t}, {S = t}, {S < t} are all in Z,” and “if A E€ Fs, 
then AN {S < t} €.¥,” to the case of a nonconstant stopping time T. 
5. Let (Q, Z P) be a probability space, (¥;);>9 be a nondecreasing family 
of sub-o-fields of % and (T„) be a sequence of stopping times. 
(a) Show that sup, T, is a stopping time. 
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(b) Show that, if the family ¢¥;),>9 is right-continuous, inf, T, is a 
stopping time and Fy = [l], Fr,- (Show first that Fr} = .¥7.) 


9.6 MARTINGALES 

In Chapter 6, we defined the martingale and submartingale properties for 
sequences (X, ),>9 of random variables; in this section we generalize these no- 
tions to the case of an uncountable index set R*. Many results of Chapter 6 can 
be generalized to right-continuous submartingales, but we will give only the re- 
sults needed in Section 9.7. Two martingales will be essential in the definition 
of stochastic integrals with respect to the Brownian motion, B, and B? — t. 


9.6.1 Definition. Let (Q, .¥, P) be a probability space, and (¥;),>9 be 
a nondecreasing family of sub-o-fields of .~ A family of random variables 
(X;);>9 18 a martingale with respect to (¥;),>0 iff 


(a) each X, is .%;-measurable and integrable, and 
(b) E[X, | ¥,] =X, as. 1fO<s < t. 

The notions of sub- and supermartingale can be similarly generalized. 

If F; = o(X5, 5 < t), we will say that (X,),>9 is a martingale without spec- 
ifying the family of o-fields. 


9.6.2 Examples. 1. The Brownian motion (B;),>9 is a martingale: if s < f, 
then, since the random variable B, — B, is independent of .#, and has mean 0, 


E(B; — B, | FB | = E(B; — B,|= 0. 


2. The process Y, = B?~‘ is also a martingale: we start by noticing that, if 
S<f, 
E[(B, — Bs)” | #5] = E[(B; — B,)"] =t ~ s, 


and 
E[(B; — Bs)” | @;] = E[B? — 2B,B, + B? | @,] = E[B? — B? | 24, 


so that 
E[B? — B? | @.]=t-s. 


To work with the uncountable index set R*, we need some regularity con- 
ditions on martingales. This is why our first step is to show that, under mild 
conditions for the family of sub-o-fields, we can find a right-continuous ver- 
sion of a martingale. The argument is similar to the proof of the existence of 
a continuous version of the Brownian motion: we first study the martingale on 
a countable dense subset $ of Rt, and we then construct a right-continuous 
version by taking limits along S. 
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9.6.3 Theorem. Let (Q, Z, P) be a probability space, and (¥;),>9 a nonde- 
creasing right-continuous family of sub-o-fields. We assume that the proba- 
bility space is complete in the sense of 1.3.7, and that each 7; contains the set 
N = {A € Z P(A) = 0}. Then any martingale (X,),.9 with respect to (7;);50 
admits a right-continuous version. 


Proor. (a) Let S be a countable dense subset of R*, a and b be two real 
numbers such that a < b, and let S$, = S$ [0, n]. For any finite subset J of 
S,, it follows from Theorem 6.4.2 that the number U,,(/) of upcrossing of 
(a, b) by the process (X;);<; satisfies 


1 
EU a (D) < pa lAn a) ], 
— A 


and therefore, for any A > 0, 


l + 
Poll) ZN < gE -o (1) 
Let Ik, k = 1,2, ... be an increasing sequence of finite subsets of S, such 
that J, Zz = Sn. The set {lim Uap(Ik) = 00} is measurable, has probability 
QO and contains the set An ap = {w: Jt € [0, n) such that lim inf,—ts>tses 
X;<a<b<limsup,,, ..,,esXs}. [ÉE Plim: Uap) = 00) =a > 0, 
then, since the U,,(/;) increase with k, for any À > 0 there exists k such 
that P(U,,U,) > à) > a@/2. For à large, this is contradictory to inequality 
(1).] Since is complete for the probability P, A, ab is measurable and 
P(A» ab) = 0. 
(b) Let 


A = fæ: dt €[0,00) such that liminf X, < limsup X,}, 


The set A is contained in the set (U,, a pÁn,ab) where the union is taken for 
all integer values of n, and all rationals a < b; since (UJ, , , An.ab), = 0, A 18 
measurable and has probability 0. Define the process 


n,a,b 


liM, s>2se9 Xs, on the complement of A, 
Y, — 
QO, onA. 


Since F; = Fi, contains all the sets in.” = {A €. F: P(A) = 0}, the random 
variable Y, is .¥;-measurable. 


In the equality X, = E[X, | Z], s <n, let s decrease to t; X, converges 
a.s. to Y;, and E[X, | *,] converges to EIX, | Z1 = E[X, | Z] =X, as. 
(6.4.4). Therefore (Y,)>o is a right-continuous version of the martingale 
(X;):>9 and is itself a martingale. C 
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We now give a couple of results necessary to prove 9.6.8, which will be 
essential in the construction of stochastic integrals. 


9.6.4 Lemma. Let (X,);>o be a right-continuous submartingale with respect 
to a family ¢¥;),>9 of sub-o-fields. Then for any à > 0 we have 


AP{supX, > A} < J X,dP. 
(sup, <, Xs >À) 


SI 


Proor. We first give the proof when the index set is finite. We consider the 
submartingale on the index set ft; < t2 < --- < tz. Let T be the stopping time 
defined by 


min{t;: X; > A}, on the set {max;X, > A} 
T= 
tk, on the set {max; X, < A}. 


We have 


k-1 

AP(max Xj, > À) < X EIX, Iir=n)] + E[X,, 1 7r=1,.max; x, >a} 
i= | 
k—1 

< SEX, tirz] + ELX aT iT=4, max; X, >A)] 


i=] 


— J Xp dP. 
{max, X; >A] 


Consider an increasing sequence of finite set 7; such that J, 7; = Q;, where 
Q, = {all rationals smaller than t} U {t}. Since the random variable X, is in- 
tegrable, and the sets {max;,<;, X;, > A} increase to the set {sup <q, Xs > A}, 
we obtain 


AP{sup X, > A} < J X, dP. (1) 
seQ, (SUP. <a, X,>A} 


Since the submartingale (X;),>9 is right-continuous, we have 


supX, = sup X,, 
Szt sEeEQ, 
and 
AP{supX, > A} < J X,dP. O 
{sup,<,Xs>A} 


s<t 
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9.6.5 Lemma. If (X;),>09 is a right-continuous martingale with respect to 
(F;);>9, and each X, is in L2, then we have 


I sup [X5[ll2 < 21X2. 
S<f 


Proor. Let Y = sup, <, |X;s|. The process |X,| is a submartingale, and there- 
fore, by Lemma 9.6.4, we have 


AP(Y > Xr) < J |X ,| dP. 
{¥>A} 


Since we do not know yet that ||Y||2 < oo, we have to work with the random 


variables Y, = Y An. The set {Y, > A} is empty if A > n, and coincides with 
{Y > A} if A < n. Therefore Y, satisfies 


AP(Y, > A) < J X,| dP. (1) 
(Y >A} 


Let F, be the distribution function of Y,. Since s* = f 2A dA, it follows that 


E= f dF) = | f rdrdrn(s 
0 0 JO 


OS 
= J 2rAP(Y,, > àÀ)dàÀ by Fubini’s Theorem 
0 


OC 
< 2 | J |X: Hir, >a) dP dÀ by (1) 
0 JQ 


OO 
=2 f x | Iy, >} dà dP 
2 0 


== Q2E[|X | Yn] < 2X; lll Y» l2, 


and 
(Yall < 2||Xelle. 


We finish by letting n approach œo. O 


We now assume that the hypotheses of 9.6.3 are satisfied: (Q, ¥, P) is 
a probability space, and (.¥;),>9 a nondecreasing right-continuous family of 
sub-o-fields. The probability space is complete in the sense of 1.3.7, and each 
F, contains the set./°= {A € F: P(A) = 0}. 
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9.6.6 Definition. We denote by .4, the vector space of all processes X 
= (X, )o<t<a Such that (X;)o<;<g is a €¥%;)ocr<q Tight-continuous martingale 
and X, is in L°. 


If two elements X and Y of -Æa are versions of each other, the set {w: it 
€ [0, a] such that X,(@) Æ Y;(@)} 1s measurable and has probability O (see 
9.1). In this case, we will not distinguish between the processes X and Y. 

We consider on .#*, the inner product (X, Y} = E(x, Y,). 


9.6.7 Theorem. 
(a) If (x,X}=0, the martingale X is indistinguishable from the process 
identically equal to 0. Therefore ||XII A, > (X, X} defines a norm on 


Marg. 

(b) If a sequence X,, n=1,2..., converges to X in «$4, then sup,<, 
Xn. —X;| > O in L°. (dn particular, for any Ò < t <a, Xn: > X: 
in L2.) 


Proof, This follows from Lemma 9.6.5. O 


9.6.8 Theorem. The space .4, is a Hilbert space for the inner product 
(X, Y) = E[X,Y,], and the subspace of continuous martingales is closed in 
Meg. 


Proof. Let X, be a Cauchy sequence in .4,. The sequence of random 
variables X,,, 1s a Cauchy sequence in L?; let Z, be its L?-limit, and Z, 
= E(Z, | .¥;),0 < t <a. The martingale (Z,))<;<g admits a right-continuous 
version Y = (Y;)o<:<a (9.6.3), and the X,, converge to Y in 4g. 

Assume now that the martingales X,, are continuous, and let X be their right- 
continuous limit in .#%,. We can assume that X`, E[(Xn.q —Xa)*] < 00 (if 
necessary use a subsequence). Applying successively 9.6.5 and Chebyshev’s 
inequality, we have 


> Elsup [Xn — X;1°1 < 00 
n f<a 


and 


t<a 


X P(sup [Xn — X;| = £) < 00, 


for any £ > 0. By the first Borel-Cantelli lemma we get 


P(lim sup{sup |X,,., — X;| => ¢}) = 9, 
n i~a 
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and, taking a sequence £g — O, we See that 


P({sup |Xn,: — X:|-+—> 0}) < X Palim sup{sup [X,,; — X;| > e}) = 0. 
t<a p n t<a 
For almost all œ, the right-continuous path t —> X: (œ), t € [0, a], is a uniform 
limit of the continuous paths £ —> X,,;(w) and the martingale X is indistin- 
guishable on [0, a] from a continuous martingale. C 


Problems 


1. Let (X;);59 be a process with independent increments such that X, € L! 
for all t > 0. Show that Y, = X, — E(X.) is a martingale. 


2. (a) Let (B;);>9 be a Brownian motion and .%, = o(B,, s < t). Show that, 
for any u € R, the process 


Y = expiuB, + tut 


is a martingale for the family (.7;),>0. 
(b) Conversely, let (X; )>o be a continuous stochastic process such that 
Xo = 0 and for any u € R the process 


Y = expiuX, + 4u’t 


is a martingale with respect to the family (7; = o(X,, 5 < t))>0- 
Show that (X;);>9 1s a Brownian motion. 


3. Let (Q, Z, P) be a probability space, (F;)>o a nondecreasing 
right-continuous family of sub-o-fields and (X,),>9 a right-continuous 
martingale. 

(a) Show that, if T is a stopping time bounded by to, then Xr 1s .¥7- 
measurable and X7 = E[X,, | ¥7] a.s. 

(b) Show that, if T is a bounded stopping time, then the process Y, = 
Xtar, t > O, is a martingale. 

(Hint: use the stopping times T, = k/n if (kK—1)/n <T <k/n,k=1, 

2,..., Corollary 6.4.4 and Problem 5 of 9.5.) 

4. Let (Q, ¥, P) be a probability space, ()>o a nondecreasing 
right-continuous family of sub-o-fields and (X;);>ọ a right-continuous 
martingale. We assume that the random variables X,, t > 0, are uniformly 


integrable. 
(a) Show that Xæ = lim;_,.. X; exists a.s. and that (X;)p<;<oo 18 a mar- 
tingale. 


(b) Show that, for any stopping time (finite or not), Xr 1is.A%7-measurable, 
integrable and Xy = E[X., | Ir] a.s. 


(Hint: the inequalities and methods of 6.4 extend easily to right-continuous 
martingales. ) 
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9.7 I16 INTEGRALS 

One possible mathematical model for the price of shares in the stock market 
is the Brownian motion: the price of one share of a particular stock is assumed 
to equal a + AB;(qw) at time t. At times 0 = tọ < t <-+- <t,_) <t, =t, an 
investor decides the amount f (t;, œ) of shares of the stock to have during 
time (t;, t;,;). Then the gain in the time interval (0, £) is 


n--l 


à XC f ti, @)Br (@) — B,,(@)). 


i=0 


At each time ¢;, the investor knows only what has happened up to time ti. 
Therefore each random variable f (t;, .) should be .¥;,-measurable. What hap- 
pens when the time intervals between consecutive decisions become smaller? 
Does the above sum have a limit? That is, can we define integrals of the form 
f f(t, w) dB,(w)? If the paths of the Brownian motion had a.s. finite variations 
on finite intervals, we could use 2.3.3 and 1.4.4; for almost all œ, the function 
t — B;(w) would be a difference of two continuous, nondecreasing functions 
and f f(t, w)dB;(w) could be defined, for each œw, as a Lebesgue-Stieljes in- 
tegral. Unfortunately, this is not the case and we have to use martingale theory 
to get around the difficulty. The symbol f f(s, w) dB,, t < a will no longer 
be defined path by path, but as a limit in .#%,. 

Let (B,);+9 be a Brownian motion on a probability space (2, F, P), and let 
Z denote the smallest complete o-field generated by all the random variables 
B., s> 0. We will work with the o-fields .%,’, where .%,' is the o-field 
generated by B, and./= {A e B: P(A) = 0}. The family (.%,’),>0 is right- 
continuous (9.5 Problem 2); therefore the probability space (2, 4%, P) and the 
family (.%;');>0 satisfy the hypotheses of Theorems 9.6.3 and 9.6.8. 

Which processes f (t, w) do we want to integrate? If we want f f(t, w) dB, 
to be a random variable, the process f should be measurable in both variables. 
We will also assume, as in the example of the investor, that, for each ft, the 
random variable œ — f(t, @) does not depend on the future. Therefore we 
consider only processes f (t, œ) satisfying the following conditions: 


(a) the process f (t, œ) is measurable, and 
(b) the process f(t, w) is non-anticipating (or adapted)—for each t the 
random variable w > f(t, œ) is 2, -measurable. 


For simple processes f(t, w) we know what the integral should be. 
(a) If f(t, w) — Lcu, (t), then h fe, w) dB, — Bona 7 Bura- 
(b) If ft, w)= Luv] (tæ), then h f(t, ©) dB, = 0(@)(Bona — Buna). 
For the moment we choose a > 0 and we restrict ourselves to processes 
that vanish outside of [0, a] x Q. 
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9.7.1 Definition. 


(a) We denote by .~% the set of processes f(t, w) = Impol) + 
an Ier t116): Cæ) such that 
G) O=f) <f,...,t, =a is a finite partition of [0, a], 
Gi) do is .4o'-measurable, 
(ii) fori > 1, ġ; is .@,’-measurable, and 
(iv) each @; is in L?. 
For such a process we define, for 0 < t < a, 


n—| 


Y, = J f(s, w) dBs(@) = Y pi (0) Briu (0) — Byn (0)). 


i=l 


(b) We denote by Æa the set of measurable, nonanticipating processes 
e(t, œ) which vanish outside the set [0,a] x Q, and such that 
E[f, 8° (s, w)ds] < oo. 


For processes in .“%, we have the following straightforward lemma. 


9.7.2 Lemma. If f and g are two processes in .~,, then the processes Y, = 
fo f(s) dB, and Z, = fy 9(s)dB,, O < t <a, satisfy the following properties. 


(a) (Y;)o<¢<q and (Z;)o<;<q are continuous martingales for the family of 
o-fields (Br oct<a- 


(b) The martingales Y and Z are in .4, and, for t < a, we have E[Y;Z,;] 
= E[ fy f g(s) dsl. 


(c) In particular E[Y?] = E[f, f?(s)ds], t < a, and, in 2%, 
(Y,Z)=E p fogs) ds| , 


(d) If fe Z and 0 <t <a we have f f(s) dB, = fr f (s)to.q(s) aBy. 


Proor. If the ¢;’s are in L° and are .%; measurable, then we have 
Ef: (Bra — Br) | -Br 1 = JEB, — Bu) | #,/1 = 9, 
Elp; Br — Bu) | = Elo; EBr, — By)” | Br 1 = Eld; (ty — ti) 
and for i < j, 
Elh: Bra Br); Bia — Br, DI 
= E[ġ: (Br — By PEB a B) | 2S0 O 


428 9 BROWNIAN MOTION AND STOCHASTIC INTEGRALS 


9.7.3 Theorem. 

(a) The mapping f > (Y, = f f(s) dB,)o<;<q 18 an isometry from .%, to 
Mba, where on .~, we use the norm of the Hilbert space L7({0, a] x 
2, @ ((0, a]) x .¥, ds x dP); the mapping can therefore be extended to 
the closure of .~%). 

(b) The closure of .% in L7((0, a] x Q, 2 ((0, a] x F, ds x dP) contains 
bq: 

(c) If f € Æa the process Y, = h f(s) dB,, thus defined as a limit in 224, 
is a continuous martingale, and, for each t < a, E[Y?] = ELf, f(s) ds]. 

(d) If f and g are two processes in .4,, the processes Y, = fi f(s) dB, and 
Z; = h g(s)dB;, 0 < t < a, satisfy E[Y,;Z;] = ELJ f (s)ge(s) ds]. 

(e) If f € 4, andt <a we have fy f(s) dB, = [> f(s)Ijo,.q(s) dB, as. 


Proor, (a) is a consequence of 9.7.2, and (c) a consequence of 9.6.8, Prop- 
erties (d) and (e) extend from .~, to its closure, so we just have to prove (b). 
We will follow the proof in Doob (1953). 

1. Let f(t) be a bounded Lebesgue-measurable function of t, which van- 
ishes outside a finite interval. Then, by 2.4.14, for every € > O there exists a 
continuous, bounded function f, which vanishes outside a finite interval and 
such that 
J 1O- FOP de < e. 


[By 2.4.14 there exists a continuous function f, such that fr | f(t) 


— f, (t)|? dt < e”. Assume that the function f vanishes outside a finite interval 
[a, b]. Let k be the continuous function which is 1 in [a, b], O outside of 
[a — 1, b+ 1] and linear elsewhere. The function f; = k f, vanishes outside 
a finite interval and f% fO — fe()P dt < SO IO fe? dt < e] 

Using Minkowski’s inequality and the dominated convergence theorem, we 
obtain 


00 1/2 
lim sup | lf(¢+h)—- FOO at 
h—=0 —00 


00 1/2 
< lim sup J |fe +h) — ZOKA + 2e = 2e 


h—0 — 


so that 
tim | f(t +h) — fO? dt =0. 
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2. Assume now that g(t, œ) is a bounded, measurable, nonanticipating 
process that vanishes for t ¢ [0, a]; we want to show that g is in the closure 
of Y%. 

We partition R into intervals (7/2”, (7 + 1)/2”] and consider the process 


En st, ©) = gla, t —5)+5,@) where a,(u) = 7/2" on 7/2", G + 1)/2"). 


For fixed n and s the process g,,.(t, œ) is equal to g(s + 7/2”, œ) on the 
interval (s + j/2”,s+(j+1)/2”]. Therefore the restriction of g,,(t, œ) to 
[0, a] x Q belongs to .~,. We now show that there exists an s and a sequence 
n; such that 


lim E p |En, s£, @) — g(t, w) di! 
J 0 


<limE | [tena 0) = 2, 037 it > 0. 
J 00 


This will show that the process g is in the closure of %4. 
Part 1 assures that for each w 


Ox 
lim J g(s + h, w) — g(s, 0) ds = 0. 


Since Lebesgue measure is translation-invariant we have, for each fixed ¢ and 
w, 


tim | leis +t +h, œ) -— gls +t, œ) ds = 0. 
We can replace h by a,,(t) — t, which approaches 0 as n — oo and obtain 
im, | le(a, t) +s, œ) — gt +s, w) ds = 0. 


Since all the processes considered are bounded and vanish outside a fixed 
finite interval, we have 


lim i E | eln (t) + 5,0) — gts, w) a ds 


n> J —90 00 


= lim E T [ el (0) + 5,0) = 8C +s, o) ds dt = Q0. 


H — 00 


Therefore there exists a subsequence n; such that 


lim E J gl&n E +5,@)— get +s, w)|? de — 0 a.s. Ins. 
j>œ@ [Joo 
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We have now shown that there exists a subsequence n; and at least one s such 
that 


lim E J lela, t — s) +5, @) — gt, w)? di 
j> 00 


= lim £ J |8 (&n (t) +s, @) — elt + s, o)l de = 0, 
jrroo 969 


and g is in the closure of 74. 


3. If the process g on [0, a] is measurable, nonanticipating and satisfies 


E E < 00, 
0 


_ jg, if lel <n, 
” 10, otherwise. 


we define the processes 


According to part 2, each g,, 1s in the closure of %4. The dominated conver- 
gence theorem assures that 


E f lg) — saD de — 0, 
0 
and g is in the closure of %. O 


The stochastic integrals Y, = h 2(s) dB, that we have thus defined are limits 
in L7, so the random variables Y, are only determined a.s. Theorem 9.7.3 
assures the existence of continuous versions of (Y;)o<;<g, and, when we talk 
about stochastic integrals, we always assume that we are working with a 
continuous version; it does not matter which, since two continuous versions 
are indistinguishable. 

We now generalize the definition of stochastic integrals to the interval 
[O, oo). 


9.7.4 Theorem. Let Æ be the set of measurable, nonanticipating processes 
g(t, w) on [0, 00) x Q such that E | fọ g(t)? dt} < oo, for any a > 0. 


(a) There exists a continuous martingale (Y;);+9 such that Y, = h 2(s) dB, 
a.s. for any t > 0. 

(b) If f and g are in £, then fi (af + Bg)(s)dB, =a fy f(s) dB, 
+ Bf, 2(5)dB, a.s. (a and £# are constants). 
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(c) If f e.Z, then f f(s) dB, = fo f (s\o.n(s) dB, as. for t < a. 
(d) If f and g are in.%, then for any f, 


e[([ 1008) ([s0an)] ef roroa] 


Proor. Property (a): For n> 0 and O0O<t<n, we define Yin = f 
f (SIto.n)(s) dB, using the results in 9.7.3. This is possible since fJ[O, n] 1s 
in .4,. Property (e) of Theorem 9.7.3 assures that, on [0,7], the processes 
Y n41) and Yn, are indistinguishable, and we can define the stochastic 
integrals Y, = f f (s) dB, for any t > 0 by taking Y; = Yin) on (n — 1, n]. 
The process Y, is then a martingale that is continuous except possibly for œw in 
the set [J {Y in,n-1) Æ Y íin-1,n-1)}, which has probability 0. We can therefore 
change the values of the Y;,’s on this set (for example take Y, = 0) to get a 
continuous version of the stochastic integrals. 

The other properties are true for g € 7%; they extend to g € éa, and 
g e Æ. O 


Problems 


1. Let T be a stopping time for the family of o-fields (.@,),>0, and denote 

by [0, T] the stochastic interval {(t, œ): 0 < t < T()}. 

(a) Show that the left-continuous process Ijo r; is measurable and non- 
anticipating. Therefore if f € Æ we can consider the two processes 
Y, = h f dB,, and LZ; = h flion dB,. 

(b) Show that, if a > 0 and f € .%, then the processes (Y,,7)o<;<q and 
(Z,)o<;<q are indistinguishable. (Show it first for a stopping time that 
takes only a finite number of values.) 

(c) Extend this property to the processes f € Æa and then to the pro- 
cesses f E€ Æ. 


2. Let(Q, Z, P) be a probability space, and (¥;),>9 a nondecreasing, right- 

continuous family of sub-o-fields of Z. A process f(s, œ) defined on 

RF x Q is progressively measurable iff, for any t > 0, the restriction of 

f(s, w) to [0, t] x Q is .4 ([0, t]) x ¥, measurable. 

(a) Show that a progressively measurable process is measurable and 
non-anticipating. 

(b) Show that a right-continuous and nonanticipating process is progres- 
sively measurable. 


3. Let f(t, œ) be a progressively measurable process such that, for each 


t>0, h £7(s, @) ds < 00 a.s. We want to extend the notion of stochastic 
integral to the process f. 
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(a) For each integer n > 0, let 7,,(w) = inf{t: h Fr (s, w)ds > n}. 
[T, = œo 1f the set ft: h f? (s, w)ds > n} is empty.] Show that the 
T„ are stopping times for the family of o-fields (#,'),+9, and that 
lim, T, = œ a.s. 

(b) We denote by [0, Tn] the stochastic interval {(t, œ): 0 < t < T,,(@)}. 
Show that flio,7,) is in Æ. 

(c) Let Yoni) = fo flto,r, dB; as defined in Theorem 9.7.4. Show that 
the processes Yin ;a7,) and Y(n41,a7,) are indistinguishable. 

(d) Show that there exists a continuous process (Y,),>9 such that, for 
each n, the processes (Y,,7, )>0 and (Y (n,tAT,))r>0 are indistinguish- 
able. 

We now define fy f(s)dB, by fà f(s)dB, =Y,. [The process 
(Y;);>9 18 not necessarily a martingale, but it is a local martingale 
in the sense that there exists a nondecreasing sequence of stopping 
times Tı < 7> <---<T, <... such that lim, T, = oo and each 
(Y+,7, )r>0 1s a martingale. ] 

(e) Let T be a stopping time such that flio, r; E€ Æ. Let Z, = i 
flio.r;dB;, as defined in 9.7.4. Use Problem 1 to show that the 
processes (Z;);>9 and (Y;,7);>0 are indistinguishable. 

ff) Let S,,n =0,1,..., be a nondecreasing sequence of stopping times 
such that lim, Sn = oo and each process fli95,) is in Æ. Just as 
in (d), we could use the stopping times S, to define the integrals 
h f dB,. Show that the integrals do not depend on the sequence of 
stopping times used in the construction. 

4. Generalize Problem 3 to measurable, nonanticipating processes such that, 
for each ¢t > 0, h f *(s,w)ds < œ as. [Use an argument similar to the 
proof of 9.7.3 to show that the integral f f°(s, w) ds is ¥,-measurable. ] 


9.8  Irô's DIFFERENTIATION FORMULA 

Assume that a function f(t), defined on R, can be expanded in a Taylor 
series. We know that f(t) = h f'(s)ds. We can heuristically justify this state- 
ment; if 0 =t; < f <--- <t, = t is a partition of [0, t], then 


fO- fO =Y Sta) fal 


=X |F Da — + Ota = tp" 
The first term 5°, f'(t:)X(ti+ı — ti) converges to h f'(s)ds, and, for n > 2, 
each term S*,(f(t;)/n!)(ti41 — t)” should go to 0, since $; |ti41 — til” 
< max; |t; —¢;|"7' X; lti41 — ti| —> O when the partition of [0, ¢] gets finer. 
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Therefore, if everything goes well, the sum over n of those terms also goes 
to 0. This heuristic reasoning would still apply if we considered a continuous 
function G(s) having finite variation on any finite interval: 


FCG — f(GO)) = X IS GG) — SGED 
X FRG 
=S 9 | fF GENG) - GG) + X POO Gets ~ GD” |. 
i n=2 i 


The first term converges to the Lebesgue-Stieljes integral h f'(G(s)) d G6), 
and, for n >2, So, f™(G(t)))(GUi41) — G@;))" should go to 0, since 
S; |G(tig1) — GEDI” < max; |G) — Gt) I"! E; |GGin1) — GI, and 
X |G(ti41) — G(t;)| remains bounded when the partition becomes finer 
(G is continuous and has finite variation on [0, f]). 

What happens when we replace G(s) by the paths 8, of the Brow- 
nian motion? The first term 50, f’(B,,)(B(t;41) — B(t;)) converges to h 
f'(B(s))dB(s). The term >>, f” (B,)/2(B(ti+1) — B(t;))° no longer converges 
to 0, since the variation of the paths is a.s. infinite. The sum $`; (B(ti}1) — 
B(t;))’ converges in L? to t, so So; SB, )/2(B(ti41) — BDY probably 
converges to Io f"(B,) ds. For n > 3, X; f ™ (B, )XB (t1) — B;))” should 
go to 0, since $; |B (t41) — BCt;)|" < max; |B(ti4,) — BQ)" Y; (BGi41) — 
B(t;))*. Now that we have intuitively derived It6’s formula, we are ready for 
the rigorous proof. 


9.8.1 (It6’s Differentiation Formula). Let f be a continuous function on 
R. We assume that the first and second derivatives f’ and f” exist and are 
continuous. Then, for any ¢ > 0, 


t l t 
fB,) — f (Bo) = J f'@)dB, +; | r"B)ds as. 


[Since the processes Y,= f(B,)—f(Bo) and X,= f f'(B.)dB,+ 
l h f”(B,)ds are continuous, they will be indistinguishable.] 


Proof. 1. The first step is to make sure that the integrals used in the state- 
ment of Theorem 9.8.1 exist. The process f’(B;(w)) is nonanticipating and 
continuous; therefore it is progressively measurable (Problem 2 of 9.7). For 
fixed w, the function /’(B,(w)) is continuous in s, therefore, on an interval 
[O, f], 1t 1s bounded by a constant C (w, t), and f f'(B;(w)) ds is finite. Con- 
sequently, we can, by Problem 3 of 9.7, define a continuous version of the 
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stochastic integrals h f'(B,)dB,. A similar (in fact easier) argument shows 


that the Lebesgue-Stieljes integrals h f”(B,)ds exist. 
2. Letż > 0 andto =0 <t, <--- <t, =t be a partition F of [0, t]. We 
denote by ||“|| the quantity max; |t;,; — t;|. Since the second derivative f” is 


continuous, there exists a value œ; (w) in between B, (œ) and B,,,, (@) such that 


f (Bi...) — f (Bi) = fB, (Bi, — By) + ZF" (GiB, — By)’. 


There exists $;(@) € [f;, ti+ı] such that Bs (œ) = a;(@) [this is due to the 
continuity of the function s > B,(q@); the random variable $; is not necessarily 
a stopping time], and we have 


f (B:) — f (Bo) = > Tf Bi.) — FB) 
— ` f(Br) Br, ~~ Br) + ` 5 S" Bs) Bra ~~ By. 


We study separately the limit of each sum as the partition gets finer. 

3. We want to show that Z; = 0, f’(B,,)(B;,, — B) converges to U, 
= f f'(B;) dB,. The process >). f’(B;, Me, ,1(5) converges a.s. to the pro- 
cess f'(B,)I,1(s), but, since sup, ,<;|f’(B;)| is not necessarily finite, we are 
not sure that 


4 


l 


2 
t 
E J (£ f (B, Mintia 14S) 7 f'Bs Non o) ds —> 0. 
Consider the times Tm, m > 1, defined as follows: 


T = inffs: s > 0,|B|> m}, if |B;|>m forsome s> 0 
m oo, if {s: s>0,|B,| > m} = Ø. 


The set {Tm < t} = {SUP cQ. |B-| = m} (9.5.5) is in %;', and the T,,, form a 
nondecreasing sequence of stopping times that converges to oo. For each m, 
the function f'(x) is bounded on [—m, m], therefore 


2 


E J LO, Tn] 5 f (B Mo.) — f BAM o,s o) ds| — 0, 


when |I| —> 0; therefore Zar, > Urar, in L? and in probability [we used 
Problem 3(e) of 9.7]. The terms Z, — Z,,7,, and U, — U;,7,, are zero except 
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on the set {Tm < t}. By first choosing m such that P(T m < t) < £, and then 
||| small enough to have P(|Z;,7, — Utar,,| > £) < £, we obtain 


P(IZ; — Ui| > £) < P(Zinty — Uraty| > €) +P(Tm < t) < 2e. 


Therefore Z, converges in probability to U;. 


4. We now study the term 55, f”(Bs,)(B;,,, — B, Y. We want to show that 
if the sequence of partition is well chosen, it converges to fy f" (Bs) ds in 
probability. 

For partitions Z of [0, t], let us consider the random variables 


X(P) = XOS" Bs) — FB Bra ~ Bi 
YP) = X Ff" BiB, — BY — (tiz — t). 


Since $; f"(B,,)(tis1 — ti) converges for each œ to h f"(B,) ds, it is 
enough to show that i; f”(Bs) Brna — Br, ~~ D, S" Bati — ti) = 
X(P) + Y (P) converges to 0 in probability. 

We first study the term X(“). 


X(P)| < max |f"(Bs,) = f” BA X Bra ~ Bad 


According to 9.3.4, we can choose a sequence of partitions A,, n > 1, of 
[0, t] such that ||*, || > 0, and $7 .(B;,,,, — Bs, )? converges a.s. tof (L? conver- 
gence implies the existence of a subsequence which converges a.s.). For each 
fixed w, the continuous function s > f”(B,)is uniformly continuous on [9, £], 
and max; | f” (Bs,) — f’’(B;,)| —> 0 when ||2,|| approaches 0. Therefore X (3n) 
— 0 a.s. and in probability. 

We now study the term Y (P). Let Tm, m > 1, be the stopping times defined 
in part 3 of the proof. For each m, we have 


| sup f"(Bsar,)| < sup If) = Km. 
k 


-MXM 


We consider the random variables 


YA, Tm) — `> FOB a7, UB i AT — BEAT m y7 — (fi+1 A Tm — iA Pm), 


and denote by V; the random variable 


V; — F (Bi at, UW Bra AT, o By AT, y 7 (ti+1 A Tm o fi A T m)l. 
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We have 
2 
E[Y(P,Tm)]=E X v) |= ape Vi} +25 0 E(ViV il. 
i i<j 
Since the processes B,,7, and U; = B? —sAT,, O<s <t, are mar- 


tingales for the family of o-fields Biar, “apply Problem 3 of 9.6 to the 
martingales B,, B? — s and the bounded stopping time T,, A t), we have 


E(B. — BijaTm) | Beaty] 


= = E[B? Hi AT m — 2B, AT nm Bt, Tn + Bi AT, | Bis rT | 
= EJB; ar, -Bi ar, | BT, l 


= E[t; ATm —t)ATm |B, a7, |. 
Therefore, for i < j, 
EV: V ;] = E[V;E[V; LB AT, ]] = 0 


and 


EVYY(ATmyY|=E 


Dv 
< KpE S [Br rTn = Br rT») — (tizi ATm ti A T m)l? 
< KE X [Ba E B, yY _ (ti44 _ til —> 0 


(see the proof of 9.3.4). Hence Y(F T m) converges to 0 in L? and in probability 
when |2 || > 0. Since (YH ) 4 Y(Z Tm)} C {Tm < t}, we can show, as in 
Part 3, that Y (Z ) > 0 in probability when |7 || — 0. O 


9.8.2 Corollary. In particular we have 
t 
B= | 2B, dB, +t. 
0 


(We already knew that B? — t is a martingale, It6’s Formula gives us the exact 
form of the martingale.) 
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Problem 
1. Let a(t, w) and A(t, œ) be two continuous, nonanticipating processes. 


(a) Show that the process X, = f a(s, wo) dB, + f B(s, œ) ds is contin- 
uous and nonanticipating. 

(b) Show that if the function f(x), x€ R, has a continuous second 
derivative, then we have, for each t, 


f%) = Fo) + | F/X)als)dB, + [ £xopeyas 
0 


1 t 
+5 f reyeriyds as 
0 


(Hint: use an argument similar to the proof of 9.8.1.) 
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We have proved Itô’s formula in its simplest form; generalizations and ap- 
plications to stochastic differential equations can be found in Wong (1973), 
McKean (1969), Itô and McKean (1965) and the chapter on diffusion in 
Gikhman and Skorokhod (1969). A general treatment of martingales and 
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APPENDICES 


Appendix 1 Tye Symmetric RanpomM WALK IN RR*: A PRECISE 
ANALYSIS FOR K = 1 AND K = 2, AN INFORMAL APPROACH 
FOR K > 3 
Consider a Markov chain with transition probabilities p;;, and let Di be 
the probability, starting from state i, that the process will be in state j at time 
n, that is, after n transitions. Let f (n) be the probability, starting from 7, that 
the first return to 7 will occur at time n. We then have 


it 
(k k) 
P =Y fio Pi 
Al.I Theorem. 


(a) Ya <A Doi p 


r=0 
(b) State i is recurrent if and only if `> pi; ) = 00, 


n 


PROOF. 
n 
(a) So pi? = fir =o fi pi 
» 3 : -5 » 
N 


(k) (r) 
PO 


k=l 


(b) If So, ps < oo then by the Borel-Cantelli Lemma, i is visited only 
finitely many times and therefore must be transient. If the series diverges, let 
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fi; be the probability, starting from i, that there will ever be a return to i. 


Then 
OO N > Pi 
fi= > fio => fi = — 
k=l k=1 ya 


by part (a), and the fraction approaches 1 as N — oo. Therefore i is recur- 
rent. UU 


The symmetric random walk in R* is a Markov chain whose state space con- 
sists of k-tuples of integers. At each transition, exactly one coordinate changes, 
by +1 with equal probability. The probability of moving from (a), ...,4;,..., 
ay.) to (aj,...,a; +1,...,a,), as well as the probability of moving from 
(a),...,@;,...,ap) to (aj,...,a; —1,...,a,), 18 sk fori=1,. , k. Thus 
in one dimension, we have a particle that moves right or left with equal prob- 
ability. In two dimensions, the particle moves right, left, up or down, with 
probability : in each case. In three dimensions, a particle at (x, y,z) can 
move to (x + 1, y,z), @— 1, yz), @ y+1,z), @ y— 1,z) @ y,z4+ 1), or 
(x, y,Z— 1), and each outcome has probability z. 


Al.2 Theorem. 


(a) When k = 1 we have po” — (2) ( j”, and the states are recurrent. 


l 

2 
2 

(b) When k =2 we have p?” = G)” Cr) , and again the states are 


rn 
recurrent. 
(c) When k > 3, the states are transient. 


Proor. (a) Starting from i, the particle will be at i after 2n steps iff it has 
taken exactly n steps to the nght and n steps to the left. Since the n positions 


2n ) ways, the formula for 


p2”") follows. Now (2) = Qn)! and by Stirling’s formula, 


where a step to the right occurs can be chosen in ( 


(2n)! ~ (Qn)"e7*"./2n2n and n! ~ n"e~"V2nn. Thus 


2n 24n l 1 
~ Cn yn \ E 


the states are recurrent. 
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(b) Starting from i, the particle will be at i after 2n steps iff for some f 
between 0 and n, it has taken exactly t steps right, ¢ steps left, n — t steps up 
and n — ft steps down. Therefore, 


o g (2n)! 1\ 7 
pi” = =>, ! 4J ` 
> tit!n—t)i(n—t)! \4 


Multiply and divide by n!n! to obtain 


nn = ()" 2) 2 GE) enma ()=(" ), 


Now in selecting n objects out of 2n we can choose ¢ from the first n and 


n — t from the second n, so the summation is simply (2 "), This establishes 


the desired formula for p°”. By part (a), p(” ~ 


are recurrent, 

We now argue intuitively to justify that for all k > 3, p; 

(2kn) k/2 

that is, for some constant C we have p;; <C /n*/* for all sufficiently large 
n. In a sequence of 2kn steps, roughly 2n will involve a change in the first 
coordinate, and in the remaining 2(k — 1)n steps, a coordinate other than the 
first will change. Thus we can think of decomposing the k-dimensional walk 
into two subwalks of dimensions 1 and k — 1, and both subwalks must be back 
in their initial state at the end. We can then complete the heuristic argument 
as follows to show that the states are transient for all k > 3. [A formal proof 
by induction can be constructed by defining b? ") as the probability that a 
symmetric k-dimensional random walk is back at its original state after 2n 
steps, and then showing that 


(2n) - 2n l * 28 l 1\ 7 (2n— -2s) 
b= hz) le) Ug) Pe 
s=0 


(There must be 2s steps in which the first coordinate changes and is back at 
its original value at time 2n.)] 

As we have seen in the proof of Theorem A1.2, pe ~1//nm for the 
1-dimensional walk, and for the (k — 1)-dimensional walk we have (by induc- 
tion hypothesis): 


~ 1/nnz, and again the states 


(2kn) __ O(n —k/2) 


pE ATD) — O([(k o 1n] ED) — O(n &- DP), 
Thus for the k-dimensional walk, 
po” = O(n! n —(k— Ii?) — O(n Key, 


ii 


Since Y n™% < oo for k > 3, the states are transient. C 
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Appendix 2  SemicontTINuoUs FUNCTIONS 

If fi, f2, .. . are continuous maps from the metric space © to the extended 
reals R, and f,(x) increases to a limit f(x) for each x, f need not be con- 
tinuous: however, f is lower semicontinuous. Functions of this type play an 
important role in many aspects of analysis and probability. 


A2.1 Definition. Let Q be a metric space. The function f: 0 — R is said 
to be lower semicontinuous (LSC) on Q iff {x € Q: f(x) > a} is open in Q for 
each a € R, upper semicontinuous (USC) on Q iff {x € Q: f(x) < a} 1s open 
in Q for each a € R. Thus f is LSC iff — f is USC. Note that f is continuous 
iff it is both LSC and USC. 

We have the following criterion for semicontinuity. 


A2.2 Theorem. The function f is LSC on Q iff, for each sequence {xn} con- 
verging to a point x € Q, we have liminf, f (Xn) = f(x), where lim inf, f (xn) 
means sup, inf,» f (œ). Hence f is USC iff limsup, f(,) < f (x) when 


Xn —> X. 


Proor. Let f be LSC. If x, > x and b < f(x), then x € f~'(b, oo], an open 
subset of Q, hence eventually x, € f'(b, co], that is f (xn) > b eventually. 
Thus lim inf, f(x,) > f(x). Conversely if Xn —> x implies 


lim inf f (tm) Z f(x), 


we show that V = {x: f(x) > a} is open. Let x, — x, where f(x) > a. Then 
liminf, f(x,) > a, hence f(x,) > a eventually, that is, x, € V eventually. 
Thus V is open. O 


We now prove a few properties of semicontinuous functions. 


A2.3 Theorem. Let f be LSC on the compact metric space 92. Then f 
attains its infimum. (Hence if f is USC on the compact metric space 9, f 
attains its supremum.) 


Proor. If b = inf f, there is a sequence of points x, € Q with f(x,) —> b. By 
compactness, we have a subsequence x, converging to some x € Q. Since f is 
LSC, lim inf, f (Œn) > f(x). But f (xp) — b, so that f(x) < b; consequently 
fxy=b. O 


A2.4 Theorem. If f; is LSC on Q for each i € J, then sup; f; is LSC; if 7 
is finite, then min; f; is LSC. (Hence if fz is USC for each i, then inf; f; 1s 
USC, and if J is finite, then max; f; is USC.) 
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Proor. Let f = sup; fi, then {x: f(x) > a} = (erx: fix) > a}; hence 
{x: f(x) > a} is open. If g = min( fi, f2,..., fn), then 


(x: g(x) > a} = (bx fi@) > a} 


i=] 
is open. L 
A2.5 Theorem. Let f: Q — R, Q any metric space, f arbitrary. Define 


flx) =liminf f(y), x € 


that is, 
f(x) = sup inf f(y), 
V yeV 


where V ranges over all open balls with center at x and radius 1/n, 
n=1,2,.... Then f is LSC on Q and f < f; furthermore, if g is LSC 
on Q and g < f, then g < f. Thus f, called the lower envelope of f, is the 
sup of all LSC functions that are less than or equal to f (there is always at 
least one such function, namely the function constant at —oo). 

Similarly, if f(x) = lim sup,_,. f (Y) = infy sup,-y f(y), then f, the upper 
envelope of f, is USC and f > f; in fact f is the inf of all USC functions 
that are greater than or equal to f. 


Proor. It suffices to consider f . Let {x,} be a sequence in Q with x, —> x 
and liminf, f(n) < b < f(x). If V is a neighborhood of x, we can choose 
n such that x, € V and f(x,) < b. Since V is also a neighborhood of x„, we 
have 


b> f&n) >= inf f(y), 
— yeV 


SO 
f (x) = sup inf f(y) <b < fœ), 
V yeVv 


a contradiction. By A2.2, f is LSC, and f < f by definition of f. Finally 
if g is LSC, g < f, then f(x) = liminf,_,, f(y) => lim mf, g(y) = g) 
since g is LSC. [If sup, infyey 2(y) < b < g(x), then for each V pick xy € V 
with g(xy) < b. If we do this for V = V, = B(x, 1/n), n=1,2,..., the 
Xn = xy, form a sequence converging to x, while lim infyg(xy) < b < g(x), 
contradicting A2.2.] O 


Appendix 3 COMPLETION OF THE PROOF OF THEOREM 7.3.2 443 


A2.6 Theorem. Let Q be a metric space, f a LSC function on Q. There is a 
sequence of continuous functions fa: Q — R such that f, + f. (Thus if f is 
USC, there is a sequence of continuous functions f, | f.) If |f| <M <œ, 
the fan may be chosen so that | f,,| < M for all n. 


Proof. (Following Hausdorff, 1962). First assume f > O and finite-valued. 
If d is the metric on Q, define g(x) = inf{ f(z) + td (x, z):z € Q}, where t > 0 
is fixed: then 0 < g < f since g(x) < f(x) +td(x,x) = f(x). 

If x, y € Q, then f(z) + td, z) < f(z) +td(y,z)+td, y). Take the inf 
over z to obtain g(x) < g(y)+td(x, y). By symmetry, 


le(x) — g(y)| < td, y), 


hence g 18 continuous on Q. 
Now set t = n; in other words let f,,(x) = inf{ f(z) + nd (x, z):z € Q}. Then 
0< fa th< f. But given e > 0, for each n we can choose z, € Q such that 


fn) +E > fl) tnd, Zn) > nd(x, Zn). 


But fax) +e < f(x)+ €, and it follows that d(x, zn) —> 0. Since f is LSC, 
liminfr> oo f (Zn) > f(x); thus f (zn) > f(x) — £ eventually. But now 


fan) > fn) — E tnd, zn) > f(2n)— E 


> f(x)— 2e for large enough n. 


It follows that 0 < fat f. If |f| <M <œ, then f+M is LSC and non- 
negative; if 0 < g, t f +M, then f, = 2, —M t f and |f,| <M. 

In general, let A(x) = xi + arctan x, x € R; then A is an order-preserving 
homeomorphism of R and [0, z]. If f is LSC, then Aof is finite-valued, 
LSC, and nonnegative, so that we can find continuous functions g, such that 


gn thof. Let fa =h ogn; then fa f f. O 


Appendix 3 COMPLETION OF THE PROOF OF THEOREM 7.3.2 
The “if” part of the theorem remains to be proved. We may assume with- 
out loss of generality that EX, = 0. The uan condition is equivalent to the 


Statement that hg (u/c) > 1 as n — oo uniformly in k = 1,...,n, that is, 
u 
max t-a (#)|>0 as n — oo, 
Ll<k<n Ch 
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Furthermore, in this case 


uniformly for u in any bounded interval (see Problem 3, Section 7.3). 
(a) We show first that 


Let hy denote the characteristic function of the random variable Y. By the 
normal convergence, 


mo- (2) Hf +(™(2))] +o FF 


and the convergence is uniform for u in any bounded interval (see 7.2.9). If 
“Log” denotes the principal branch of the logarithm, then 


is the (necessarily unique) continuous logarithm of h;, (u) having the value 0 


at u = 0. Thus 
ki 7 —u? 
D Log 14 (m (=) — L) > —_., 
i Cy 2 


Therefore, 


2 


n n 2 
TOREO RE 
2 k=l Cn k=l Cn 


where each |6| is at most 1. But if a, +b, > 0, then limsup, o lanl 
= limsup, _,.. |bn| (eventually |a,| < |b | +£, and thus lim sup, oo |@n| 
< lim sup,,_,o. |Pn|); hence 
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imsap + [a (5) = 
m sup | — — | — 
a 2 k=l " Cn 
n u 2 
= lim sup S 4 hy, (=) -1 
n k=l n 


hy (=) | 
Cn 


< limsup max 
n l<k<n 


Now 


and 


Tome 


k=l 


[ exp (=) — L dF, (x) 
OO 2.2 
[one 


k=] 


t 
= 
MK 


This proves (a). 
(b) For any ¢ > Q, 


le > 
lim sup i — 2 Deca dFi(x)| < 22 
For by (a), 
2 n 
u u 
zÈ re [i-a (Z) > 
k=l 
that 1s, 


2 n oo 
a S) (1 — cos =) dF,.(x) > 0, 


n 


TÈ faa 7) 
— — 1 — cos — | dF, (x) 
2 2 Ixl < Eca Cn i 


k=] 


- > | (1 — cos = dF,(x) > 0. 
k= X| Z> ECh n 


l 


or equivalently, 
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Again, noting that a, + b, — 0 implies lim sup, |an| = limsup,_,., lbnl, 


we have 
? n 
lim sup t S) (1 — cos =) aF’.(x) 
n—> OO 2 k=] |x] EC Cy 
= lim sup S) (1 — cos =) dFz (x)| . (1) 
n—> OO k=] Ix| > ECy C 
But 


n 2 
HU X 
5 | — — | dF < $ — dF 
J. <ECh ( es =) Ws yi <ECn 2c? k (x) 


k=l 
2 n > y? 
Saa AT 
Ch k=l 


and thus the absolute values around the “lim sup” terms in (1) may be dropped. 
[This also shows that the quantity inside the brackets in (b) is nonnegative. | 
Consequently, 


2 n 
l 
lim sup |> }1- — Y J x’dF; (x) 
n—> Oo 2 C Ixl<ec, 
< lim sup Ee > | (1 — cos =) ths) 
n= Ix|<Ecp, C 


H k=l 


= lim sup) (1 — cos =) dF; (x) by (1). 
n—> OO J. 
But 
S (1-00 =) dF, (x) < 33 [are 
k=l] |x| >ECn |x| ZECch 
=2 X P{|Xkl > een} 
k=l 
n o2 
<2 ~ 33 by Chebyshev’s inequality 
k=l n 
2 . 
=>, proving (b). 
E 


The proof of the theorem is completed by letting u — œœ in (b). L 
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Appendix 4 PROOF OF THE CONVERGENCE OF TYPES THEOREM 7.3.4 

(a) Let G,, G, F,, F be the distribution functions of X,, X, Y,, Y, re- 
spectively. 

We may select convergent subsequences {a,,}, {bn,} such that 


An, — a, by, — b, where O<a<om, -0 < b< oO. 


We first show that a < œœ. 
Suppose that a = oo. Let E = {x € R: limsup, , ,,(Gn,X + On, ) < oO} and 
let c= sup E. (Take c= -œ if E = 0.) 


(G) IfxeR,x<c, then a,x + bn, > —OO. 


Proof. Let x <u <c, with u € E (u exists since c= sup E). Then u € E, 
x < u implies x € E by definition of E. 
Now 


An, X + bn, = an, (x — U) + án, u + bn,; lim sup(an,u + bn) < 00 
k> 00 


since u € E, and a, (x — u) > —oo since a,, > a= œ and x — u < 0. This 
proves (i). 


gi) Ifx e€ R, x< c, then G(x) = 0. 


Proor. Let ¢ > 0 be arbitrary. Choose z € R such that F(z) < £ and z is a 
continuity point of F. Then Fa, (z) > F(z), so eventually Fa, (z) < £. By (i), 
eventually a,,x + bn, < Z, SO 


Gn, (X) — Fn, (An, X + bn,) < Fn, (Z) <E 


for large enough k. Thus G,, (x) > 0, x < c. Let x < x’ < c, x’ a continuity 
point of G. Then G(x’) = liM Gn, (x’) = 0, and since G(x) < G(x’), we 
have G(x) = 0, proving (11). 


Gu) If x e R, x> c, then G(x) = 1. 


Proof. By definition of E, we find a subsequence {r;} such that a,x 
+ b,, — oo. Choose a continuity point w of F such that F(w) > 1 — e. Even- 
tually, a,, x + b, > w, so G, (x) = F, (ax +b.) = F,(w) > 1-— e. Thus 
G,,(x) > 1 for x>c. If c< y<x, y a continuity point of G, we have 
G(x) = G(y) = lim;>æ G) = 1, proving (iii). 
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Again, noting that a, + b, — 0 implies lim supp,» |a,| = im suppa [Bul, 


2 n 
É D) (1 — cos =) dF, (x) 
|x| <ECcn Ch 


k=] 


/ (i — cos =) dF} (x)|. 
|x| >Ecn C 


k=] 


But 
(1 - cos =) dF; (x) < z 
Ix|<€c, 


n 
D Jaa 
Ca 


k=] 
= $ 5o nd “So = 


Cn g] 


we have 


lim sup 
Hi—>> OO 


(1) 


= lim sup 


Hi O© 


a (x) 


and thus the absolute values around the “lim sup” terms in (1) may be dropped 
(This also shows that the quantity inside the brackets in (b) is nonnegative. | 


Consequently, 
1 n 
lim sup | — |1 — = D / x dF; (x) 
n-roo \ 2 Ca kay J ixl<eer 
ur ú UX 
< lim sup |— — / (1 — cos =) dF (x) 
NOOO pay Y |X| <eCn Ch 
= lim sup >| (1 — cos “| dF, (x) by (1). 
n— OO k=] Ix|>&c,, Cn 
But 


J (1 — cos =) dF (x) < 5 | dF). (x) 
k |x| > Een xl > Ech 


=2 X  PtlXel = een) 
k=l 


no 2 
ok by Chebyshev’s inequality 


proving (b). 


5 


e2 
[| 


The proof of the theorem is completed by letting u —> oo in (b) 
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Appendix 4 PROOF OF THE CONVERGENCE OF TYPES THEOREM 7.3.4 

(a) Let G,, G, Fa, F be the distribution functions of X,, X, Y,, Y, re- 
spectively. 

We may select convergent subsequences {a,,}, {b,,} such that 


An, > a, bn, — b, where QO<a<o, -œ < b < OM. 


We first show that a < oo. 
Suppose that a = œ. Let E = {x € R: limsup, ,,,(@n,x + bn,) < Oo} and 
let c = sup E. (Take c = -œ if E = 0.) 


G) IfxeR,x<c, then a,x + by, > —oo. 


Proor. Let x <u<c, with u € E (u exists since c = sup E). Then u € E, 
x < u implies x € E by definition of E. 
Now 


An, X + bn, = an, (xX — U) + an, U + bn,; lim sup(an, u + bn, ) < 00 
k>00 


since u € E, and a, (x — u) > —œ since an, > a = œ and x — u < 0. This 
proves (i). 


ai) Ifx e€ R, x< c, then G(x) = 0. 


Proor. Let £ > 0 be arbitrary. Choose z € R such that F(z) < £ and z 1s a 
continuity point of F. Then F,,,(z) > F(z), so eventually Fn, (z) < £. By (0, 
eventually a,,x + bn, < Z, SO 


Gy, (x) = Fn, (an, X + Dn, ) < Fn; (2) <E 


for large enough k. Thus G,, (x) > 0, x < c. Let x < x’ < c, x’ a continuity 
point of G. Then G(x’) = limy_,.5 Gn, (x’) = 0, and since G(x) < G(x’), we 
have G(x) = 0, proving (ii). 


Gi) Ifx € R, x> c, then G@œ)= 1. 


Proor. By definition of E, we find a subsequence {r;} such that a,x 
+ b,, > œ. Choose a continuity point w of F such that F (w) > 1 — e. Even- 
tually, ar, X + b,, > w, so G, (x) = F, (ar, x + b,,) > F, (w) > 1— e. Thus 
G,, (x) > 1 for x>c. If c< y <x, y a continuity point of G, we have 
G(x) > G(y) = lim; Gr (y) = 1, proving (iii). 
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It follows from (ii) and (iii) that G is degenerate, a contradiction. 


Next we show that b is finite. 

If ba, — oo, then ap, x + bn, > oo for every x € R, so the argument of 
(iii) may be repeated to show G(x) = 1, a contradiction. If b,, — —oo, then 
an, X + bn, > — œ for each x € R, so the argument of (ii) shows G(x) = 0, a 
contradiction. 

Now we show that a > Q. 

Let x be a continuity point of G, and let £1, €2 > O be such that ax + b+ £1 
and ax + b — £ are continuity points of F. Then a,,x+b,, —> ax + b, so 
eventually ax + b — £2 < an,x + bn, < ax +b + £1; hence 


F„ (ax +b — £2) < Fy, (an, X + bni) = Gn, &) < Fa lax +b +€). 


Let k > œ to obtain F (ax + b — £) < G(x) < F (ax + b + £1). Since £; and 
€ may be chosen arbitrarily small, F (ax + bY < G(x) < F(ax + b) for all 
continuity points x of G. 

If a = 0, then F(b) < G(x) < F(b) for all continuity points, and hence for 
all xe R. But then F(b) = 1, F(b) = O because G(co) = 1, G(—œ) = 0, 
Thus F is degenerate at b, a contradiction. 

Finally, if G is continuous at x and F is continuous at ax + b, we have just 
seen that F (ax + by < G(x) < F(ax + b), so GQ) = F (ax + b). Since there 
are only countably many real numbers y such that G is discontinuous at y or 
F is discontinuous at ay + b, it follows that G(x) = F (ax + b) for all x e R. 

Now if we have other convergent subsequences am > a’, bmi > Db, 
the above argument shows that 0 < a’ < œ, —o@ < b' < œ, and G(x) = 
F(ax + b) = F(a'x + b’) for all x € R, so that a`! (Y — b) £ (aY — hb’). 
Random variables with the same distribution have the same characteristic 
function, therefore 


exp (=) hy (=) = exp (=>) hy (=) for all u. 


Say a < a’, and set k = a/a'. Let v = u/a to obtain 


av 
hrl = fhr (Z )| = Ihr) 
a 
for all v. Thus 
|hy (v)| = |hy (kv)| = [Ay (k7v)| = +--+ = |hy (k"v)| > [hy (O)| = 1. 


It follows that Y is degenerate, a contradiction (see Problem 4, Section 7.1). 
Thus a = a’, soe!” = e~'” for all sufficiently small u; hence b = b’. There- 
fore, ad, — a, b, — b, proving 7.3.4(a). 
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_ d 
Comment. If a` (Y — b) (a’')~'(Y — b’), where a and a’ are nonzero 
but not necessarily positive, then if |a| < |a’| and we set k = a/a’, we obtain 
a contradiction as above. We can conclude that la| = |a’| but not that a = a’. 


(b) If a, > 0,X, and Y, are of the same positive type, and if a, < O, 
x, + —a_'(-Y,, + bn), hence X, and —Y, are of the same positive type. 

Let Sı = {n:an > 0}, So = {n:an < 0}. If S| is infinite, part (a) shows that 
X and a`! (Y — b) have the same distribution for some real a, b, a > O, and 


lim a, = 4, lim b, =b. 
n—- oo Hi OO 
nes, nes) 


Now suppose that S> is infinite. Then Y,, “, Y implies —Y,, , Y 
(use 7.2.9), and it follows from part (a) that for some real a’, b’, with a’ < O, 
we have X £ —(a’)-'(-Y + b’), and 


lim a, =a’, lim b, = b. 
Fi —> OO fi —> OO 
nes nesa 


Now there are three possibilities: 


Case 1. Sı and S> are both infinite. Then since a`! (Y — b) £ (a)! 
(Y — b’) £ X, we have |a| = |a’| [see the comment after the proof of (a) 
here]. Thus |a,,| —> la| and the result follows. 

Case 2. Sı is infinite, $» finite. Then a, —> a,b, — b, and X £ a 
(Y — b), proving the result. 


Case 3. S, is finite, S- infinite. Then a, —-a’',b, > b, and 
x = (a’)~'(Y — b’), and the result follows. C 
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In this appendix, u will denote a column vector with components u1, ..., Un, 
x a column vector with components x1, ..., Xn, and X a random (column) vec- 
tor with components X1, ..., Xn. The superscript ¢ will indicate the transpose 
of a matrix. To avoid awkward special cases, we agree that normal with mean 
p and variance O will mean degenerate at p. 


A5.I Definition. The random variables X,,...,X, are said to be jointly 
Gaussian (or the random vector X = (X,,..., X,,) is said to be Gaussian) if 
the characteristic function of X is 


h(u,,...,Un) = expliu’b] exp[—;u'Ku] (1) 
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where the b; are arbitrary real numbers and K is a symmetric, nonnegative 
definite matrix (with real coefficients). 


A much more concrete interpretation is possible. 


A5.2 Theorem. The random vector X is Gaussian if and only if X can be 
expressed as AY +b where the Y; are independent normal random variables 
with O mean. 


Proor. If X =AY +b, then Elexp(iu'X )] = expliu’b ]E[exp(iu'AY )]. But 


E[exp(iv' Y ) = E TI exp(iyy Y, = lI E|explivgY;)| 
k=l k=l 


le 1 
= exp 3 Sask = exp -50'DV 


k= 1 


where D is a diagonal matrix whose entries are A, = Var Y}, k= 1,...,n. 
Set V = A'u; the characteristic function of X is then given by 


h(u,,..., Un) = expliu’b — zu'KU ] 


where K = ADA’. Since the diagonal matrix D is symmetric, so is K, and K 
is nonnegative definite as well since u'/Ku = v'Dv = Y`;_; Agu; > O. 

Conversely, assume that X is Gaussian, with a characteristic function given 
by (1) above. Let A be an orthogonal matrix such that AKA = D, a diagonal 
matrix whose entries are the eigenvalues A; of K. Let 


Y = A'(X —b). 
Then 
E[exp(iu'Y )] = exp[—iu’A’b] exp[iu'A'X] 
= exp[— 5V'Kv] where V = Au. 
Thus 


1 l 
Efexp(iu'Y )] = exp | Su'A'KAu| = exp —5u'Du 


1 n 
= exp -3 Saal . 
k=l 
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The form of the characteristic function of Y shows that Y,,..., Y, are inde- 
pendent, and Y; is normal with mean 0 and variance àz. Since A is orthogonal, 
we have A‘ = A~!, so that X = AY +b. O 


A5.3 Corollary. For any column vector b and symmetric, nonnegative def- 
inite matrix K, there is always a Gaussian vector X whose characteristic 
function is given by A5.1 with the prescribed b and K. 


Proor. Let A be an orthogonal matrix with A’KA = D, the diagonal matrix 
of eigenvalues of K; then K = ADA’. Let X = AY +b, where the Y, are 
independent and normal (0, A;). The first part of the proof of Theorem A5.2 
shows that the characteristic function of X has the desired form. CI 


We may give a probabilistic interpretation of the vector b and the matrix K. 


AS.4 Theorem. If X is Gaussian with characteristic function given by A5.1, 
then E(X) = b, in other words, b; is the expectation of X;, j =1,...,n. 
Furthermore, K is the covariance matrix of the X ,, that is, Kj, = Cov(X;, Xx) 
for all j, k. 


Proor. LetX = AY +b asin Theorem A5.2. Since the Y; have finite second 
moments (in fact finite moments of all orders), so do the X;. By linearity of 
the expectation we have E(X) =b. If A is any matrix, we denote by E(A) 
the matrix whose jk entry is E(a;,). Then the covariance matrix of the X; 
can be written as 


E[(X — b)\(X — bY] =E[AYY’A’] = AE[YY']A' = ADA’, 


where D is a diagonal matrix with entries A, = Var Y;,. But by the first part 
of the proof of the Theorem, ADA’ = K. O 


We now show that if the covariance matrix K is nonsingular, then X has a 
density. 


AS.5 Theorem. Let X be Gaussian with mean vector b and covariance ma- 
trix K. If K is nonsingular, then the X; — b; are linearly independent, that is, 


if S cX; — bj) =Oae., then c;=0 forall j. 
j=l 


Furthermore, X has density f, where 


fX) = Qn)" (det K)? exp[—4(x — bY K! (x — b)]. 
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Proor. Let A be an orthogonal matrix such that A’KA =D, and let 
X = AY +b as in Theorem AS5.2. If K is nonsingular, then every eigenvalue 
A, is strictly positive, so Y has density g, where 


2 k 


—n _ 1 y? 
g) = Q7”... An)! exp -3 a 
k=l 


1 
— (27) "(det Ky? exp - Dy , 


Now the Jacobian of the transformation X = Ay +b is det A, which is +1 
because A is orthogonal. Since y = A’ (x — b), X has density f, where 


f(x) = Ory” (det KY |? exp[— s(x — b)'AD'Al(x — bJ]. 


But A’KA = D, and it follows that D! = A'K—'A, and therefore AD~!A’ 
= K! 
Now if ce’ (X — b) = 2 j=l c;(X; — b;) = 0 a.e., then 


0 = El|e’(X — b)|7] = Ele’(X — bX — byc] 
— c'E[(X —bXX —bYļe =c'KC. 


But K is nonsingular, and therefore positive definite, so c; must be O for 
all j. L 


If K is singular, the last part of the proof of Theorem A5.5 shows that 
the X; — b; are linearly dependent. For c'Ke = c'ADA'c = Y`}; Axaz where 
a = Ác. Since at least one A; must be 0, we can choose a nonzero a, 
and hence a nonzero c, such that c'(X —b)=0 ae. If, say, Xi — b),..., 
X,—b, form a maximal linearly independent subset of {X, — bj,..., 
Xn — bn}, then (X,,...,X,) has a density of the form given in Theorem A5.5, 
with K replaced by the submatrix determined by the first 7 rows and the first 
r columns of K. The remaining random variables X; — b;, j=r+1,...,n 
can be expressed (on a set of probability 1) as linear combinations of the 
X;-b,l<j <r. 

The result that nonsingularity of K is equivalent to linear independence of 
the X; — b; holds for arbitrary random variables with finite second moments, 
as the above analysis shows. 


A5.6 Theorem. (a) If X is Gaussian and Z = CX where C is any n by n 
matrix, then Z is Gaussian. 
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(b) If X,,...,X, are jointly Gaussian, then so are X,,...,X, for any 
r<n. 

(c) If X,,...,X, are jointly Gaussian, then for any constants c),..., Cp, 
> j=1¢j;4; is a normally distributed random variable. 


Proor. (a) Let X =AY +b where the Y; are independent normal ran- 
dom variables with 0 mean. Then Z = CAY + Cb, which is Gaussian by 
Theorem A5.2. 

(b) In part (a) take C = [Z0] where Z is an r by r identity matrix. 

(c) In part (a) take C = [c1 €&2...€n]. O 


A5.7 Example. Let X be normal (0, 1) and define Y as follows. Let Z take 
on the values 1 and O with equal probability, with X and Z independent. If 
Z = 1l, set Y = X, and if Z = 0, take Y = —X. Then with probability 5 we 
have X + Y = 0, so that X + Y is certainly not Gaussian. By Theorem A5.6(c), 
X and Y are not jointly Gaussian. However, X is Gaussian by assumption, and 
Y is also Gaussian because —X is normal (0, 1) and therefore, 


P{Y < y} = 5P{X < y} + 5P{-X < y} = P{X < y}. 
Thus the converse assertion fails in both A5.6(b) and A5.6(c). 


AS.8 Theorem. If X|,...,Xn are jointly Gaussian and the X; are uncorre- 
lated, that is, the covariance of X ; and X; is 0 for every jJ Æ k, then X|,..., Xn 
are independent. 


Proor. The covariance matrix is diagonal with entries A; = Var X ;. We may 
assume with loss of generality that all 1; are strictly positive, because if some 
à; = 0 then X; is constant ae. and can be deleted. Then K is nonsingular 
and K`! is diagonal with entries 1/2;. By Theorem A5.5, the joint density of 
X lowers X n 1S 


(Xx; —b;) 


1 n 
Fse Xn) = 2m)" 1... Any)” exp -5 ) 7 
j 


j=l 


The form of the density shows that the X; are independent, with X; normal 
with mean b; and variance À;. LJ 
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SOLUTIONS TO PROBLEMS 


CHAPTER 1 
Section 1.1 
2. We have limsup, A, = (-1, 1], liminf, A, = {0}. 
3. Using limsup, A, = {@: @ € A, for infinitely many n}, liminf, An 


= {w: w € Á, for all but finitely many n}, we obtain 
liminfA, = {(x, y): x7 +y% < 1}, 


lim sup A, = {(, y): x? + y? < 1} — {(0, 1), 0, —1)}. 


If x = lim supp oo Xn, then lim sup, An is either (—00, x) or (—o, x]. If 
y € Án for infinitely many n, then x, > y for infinitely many n; hence 
x > y. Thus lim sup, A, C (—oo, x]. But if y < x, then x, > y for in- 
finitely many n, so y € imsup, An. Thus (—o<, x) c lim sup, An, and 
the result follows. The same result is valid for lim inf; the above analysis 
applies, with “eventually” replacing “for infinitely many n.” 


Section 1.2 


4. 


(a) If -œ <a<b<c< cœ, then u(a,c] = u(a,b]+ pb, cl, and 
u(a, œ) = u(a, b] + wb, oo); finite additivity follows quickly. If 
A, = (œ, n], then A, t R, but w(A,) = n +— (R) = 0. Thus p 
is not continuous from below, hence not countably additive. 

(b) Finiteness of yp follows from the definition; since (—o,n] > 
oO, is unbounded. 

We have MONE A;) > a(l; A;) = ar (Aj) for all ny let n > oo to 

obtain the desired result. 

The minimal o-field Z (which is also the minimal field) consists of the 

collection “of all (finite) unions of sets of the form Bı 1B: 9°--O By, 

where B; is either A; or Af. Any o-field containing A1, ..., A, must contain 
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all sets in Z; hence Z C.F. But & is a o-field: hence Z c Z. Since there 
are 2” disjoint sets of the form Bı N --- N Bn, and each such set may or 
may not be included in a typical set in .¥,.¥ has at most 2* members. 
The upper bound is attained if all sets B} N +- - ON B, are nonempty. When 
n = 2, the sets are Ø, Q, Ay N A2, A1 O A5, Af N A2, Af NAS, along with all 
sets that can be generated from these by taking unions 2 and 3 at a time. 
9. (a) Asin Problem 8, any field over & must contain all sets in ; hence 
F C F. But F is a field; hence. ¥ c ¥. For if A; = [j1 Bij, then 
Uii AD = Ni Uji Bf which belongs to ¥ because of the 
distributive law AN (BUC) = (AN B)U(ANC). 
(b) Note that the complement of a finite intersection (Vint B;; belongs 
to Z; for example, if Bi, B2 € 4, then 


(Bi N BS) = BS UB, 
= (Bi N B2) U (BS N B5) U (B2 N B1) U (B2 A B$) EY, 


Now & is closed under finite intersection by the distributive law, 
and it follows from this and the above remark that & is closed 
under complementation and is therefore a field. Just as in the proof 
that F= F, we find that ¥ = Z. 

(c) This is immediate from (a) and (b). 

11. (a) LetA, € An=1,2,.... Then A, belongs to some %, , and we 
may assume a < œz <---, SO Ža, C me Let a = sup, Qn < 
Bi. Then all Z, C Za, hence all A, € £a. Thus |], An € @a+1 C 
SF, so |], An € % If A € Z then A belongs to some %,; hence 
A E€ Gul CY. 

(b) We have card 4 < c for all a. This is true for a = 0, by hypoth- 
esis. If it is true for all 8 < æ, then U,_, Z% has cardinality at most 
(card a)c = c. Now if Z has cardinality c, then & has cardinality 
at most cè? = (2%°)8° — 28° = c. Thus card % < c. It follows that 
U.,<s, Za has cardinality at most c. 


Section 1,3 


3. (a) Since A(#) =0 we have {2 € Æ, and Æ is clearly closed under 
complementation. If E, F €e Æ and A C Q, then 
AAO (EUF =Al[AN (EU F) NE] 
+,[AN(EUF)N ET since Fe. 
=)\(ANBE)+AANFNE'). 
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Thus 


ALA NO (E UF) +A[AN (EU FY] 
=NANE)tAAN ESO F)+MANES AFS) 
=K(ANE)+AANE‘) since F € 28 
= À (A) since E E8. 


This proves that .-¥ 1s a field. Also, if E and F are disjoint we have 


MAN(E UF) =A[AN CEU F)/NE]+ATAN (EU FN EF 
=K(ANE)+iAANFNE‘S) 
=AANE)+XAN F) since EN F=8. 
Now if the E, are disjoint sets in æ and F, = (J; E; fî E, then 
ACA) = (AN Fa) $(AN FS) 
since F, belongs to the field =£ 


> MAN Fp) HAANES) 


since EF c F$ and A is monotone 


— SOMA NED +(AN E*) 


i= | 


by what we have proved above. 


Since n is arbitrary, 


AA) > SoA VE,)+MAN ES) 


n= | 


>NMANE)+AAN ES) by countable subadditivity of A. 


Thus E € Æ, proving that .% is a o-field. 
Now A(A NE) +A(A NE‘) > A(A) by subadditivity, hence 


A(A) = SMA NE, HAANES). 


n=l 


Replace A by ANE to obtain A (A N E) = 55°), A(ANE,), as de- 
sired. 
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(b) All properties are immediate except for countable subadditivity. If 


(c) 


(d) 


A = Upa Án, we must show that u*(A) < 37°, u*(An), and we 
may assume that u*(A„) < œ for all n. Given £ > 0, we may choose 
sets En, E Fo with A, C (U, Eng and 4), p(Enk) < u* (An) + E2”. 
Then A C Up, Enk and 97, g UEnk) < Xn (An) + £. Thus u* (A) 
< So u* (An) + £, £ arbitrary. 

Now if A €.¥, then u*(A) < u(A) by definition of u*, and if 
ACU, En, En E€ Fo, then u(A) < YX, u(En) by 1.2.5 and 1.3.1. 
Take the infimum over all such coverings of A to obtain u(A) < 
p*(A); hence u* = u on Fo. 

If F € ¥o,A Cc &, we must show that w*(A) > w*(AN F)+ u* 
(AN F°), we may assume u*(A) < oo. Given £ > 0, there are sets 
E, E€ Fo with A c |, En and So En) < u* (4) + £. Now 


“(AQ F) < u* (Ue, a 2 by monotonicity 


n 


< X HE, nF) 
since j4* is countably subadditive and u* = u on so. 
Similarly, 
w(AN FS) <> WE, O F°). 
Thus 


BAN F)+pX(ANF) < X wEn) < (A) +e, 


and the result follows. 

If A= BUN, where B € o(¥o),N CM € olf), p*(M) = 0, then 
B e Æ [note Fp C Æ and Æ is a o-field, so o(¥p) C...#%]. Also, 
any set C with u*(C)=0 belongs to .4 by definition of p*- 
measurability; hence A € Æ. Therefore the completion of o(s) 
is included in .4. 

Now assume p o-finite on 2p, and let A € A. If Q is the disjoint 
union of sets A, € Jo with w(A,) < œœ, then by definition of u*, there 
isasetB, € o(p) such that A NA, C B, and p*(B, — (AN A,))=0. 
[Note that if A ¢ Æ we obtain only p*(B,) = u* (A NA, ); however, 
if A € Æ (so that ANA, also belongs to .4), we have 


"(By — (AN A,)) = u* (Bn) — w(ANA,) = 0.) 
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If B=, Bn, then B € o(¥o), A C B, and u*(B — A) = 0. This ar- 
gument applied to A‘ yields a set C € o(fo) with C CA and 
u*(A — C) = 0. Therefore, A = C U (A — C) with C € of¥%9),A—C 
C B—C € o(p), and u*(B — C) = u*(B— A)+ p(A-—C)=0. 
Thus A belongs to the completion of o(.¥ 9) relative to u*. 


Section 1.4 


I, 


Using the formulas of 1.4.5, the following results are obtained: 


(a) 3; (b) 8.5; (c) 5; 
(d) 7.25; (e) u (5,00) + u (—00, — 4) = 7.25. 


Let C; = {x Ee R”: -—k <x; <k,i=1,...,n}. Then u is finite on Cz; 
hence the Borel subsets B of C, such that u (a + B) = p(B) form a mono- 
tone class including the field of finite disjoint unions of right-semiclosed 
intervals in Cz; hence all Borel subsets of C; belong to the class (see 
1.2.2). If B € #2 (R”), then BN C; t B; hence a+ (BAC) t a+ B, and 
it follows that u(a + B) = u (B). 

Now if B € 2(R"), then B=AUC,A €.@(R"),C CDe A(R"), 

with a(D) = 0. Thus a+ B= (a+A)U(a+C), and, by Problem 3, 
a+A€e@(R")a+Ccat+De-.4(R"). By what we have proved 
above, uw(a+D)= u(D)=0; hence a+Be.64(R") and u(a+ B) 
= p(B). 
Let A be the unit cube {x €e R”: 0 <x, <1,i=1,...,n}, and let 
c = pA). For any positive integer, r, we may divide each edge of A 
into r equal parts, so that A is decomposed into r” subcubes A,,...,A;-, 
each with volume r”. By translation-invariance, ~(A;) 1s the same for 
all i, so 1f A is Lebesgue measure, we have 


w(A;)) =r "pA)=r"c=r "ci(A) = c(d), i=l,...,r”. 


Now any subinterval J of the unit cube can be expressed as the limit of 
an increasing sequence of sets B}, where each B; is a finite disjoint union 
of subcubes of the above type. Thus u = cA on subintervals of the unit 
cube, and hence on all Borel subsets of the unit cube by the Carathéodory 
extension theorem. Since R” is a countable disjoint union of cubes, it 
follows that u = cà on #(R"). 


(a) If r+x =s +x2,x1,x2 € A, then x, is equivalent to x2, so that 
xı =X, since A was constructed by taking one member from each 
distinct B.. Thus r = s, a contradiction. 

If x € R, then x € B,; if y is the member of B, that belongs to A, 
then x — y is a rational number r, hence x € r +A. 
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(b) IfO0<r <1, thenr +A c [0,2]; thus 
X {u(r +A): 0<r<1, r rational} 


— (Ue +A: O0<r<l1, r rational} | by (a) 
< p[0, 2] < œ. 


But u(r +A) = u(A) by Problem 4; hence u(r + A) must be 0 
for all r. Since R is a countable union of sets r + A by (a), a(R) = 0, 
a contradiction. 
8. Let Fa, y)=1 if x+y>0;F(x,y)=0 if x+y<0. If a =), 
b; = 1, a = —1, bə = 0, then 


Abia Ama, P(x, y) = Fb, bo) — F(a, b2) 
— F(b, ay) + F(a), a2) 
—-1-—-1—-1+0=-l;: 


hence F is not a distribution function. Other examples: F(x, y) = max(, y), 
F(x, y) = [x + y], the largest integer less than or equal to x + y. 


Section 1.5 


2. If Be &(R), 
læ: hlæ) € B} = {æ € A: h(w) € B} U{w € A‘: hlæ) € B} 
= [A N f "(BULA N g" (B)], 


which belongs to .¥ since f and g are Borel measurable. 


5. (a) {x: f is discontinuous at x} =U", Dn, where D, = {x € R*: for 
all 5 > 0, there exist x, x) € R* such that |x; — x| < ô and |x — 
x| < 6, but |f œ) — f œ| > 1/n}. We show that the D, are closed. 
Let {xg} be a sequence of points in D, with x, —> x. If 6>0 
and N = {y: |y —x| < ô}, then x, € N for large œ, and since xy € 
D,,, there are points x,, and xa, € N such that |f (a,) — f Œa) = 
I/n. Thus |x,, — x| < ô, xg, — x| < ô, but |f a) — f&a) = 1/n, 
so that x € D,. 

The result is true for a function from an arbitrary topological space 
S to a metric space (T, d). Take D, = {x € S: for every neighbor- 
hood N of x, there exist x,, x. € N such that d( f (x1), f()) > 1/n}. 
(The above proof goes through with “sequence” replaced by “net.”) 
The result is false if no assumptions are made about the topology 
of the range space. For example, let Q = {1, 2,3}, with open sets 
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A, Q, and {1}. Define f: Q — Q by fC) = f/@G) =1, f@) = 2. 
Then the set of discontinuities is {3}, which is not an F,,. 

(b) This follows from part (a) because the irrationals 7 cannot be ex- 
pressed as a countable union of closed sets C,,. If this were possible, 
then each C,, would have empty interior since every nonempty open 
set contains rational points. But then / is of category 1 in R, and 
since O = R —/ is of category 1 in R, it follows that R is of category 
1 in itself, contradicting the Baire category theorem. 


6. By Problem 11 in 1.4, there are c Borel subsets of R”; hence there are only 
c simple functions on R”. Since a Borel measurable function is the limit 
of a sequence of simple functions, there are c° = c Borel measurable 
functions from R” to R. By 1.5.8, there are only c Borel measurable 
functions from R” to R*. 

7. (a) Since the P, are measures, $`, Pa (Ax) = P, (Q) = 1, and it follows 

quickly that the a,, satisfy the hypotheses of Steinhaus’ lemma. If 
{xn} 18 the sequence given by the lemma, let $ = {k: x, = 1} and 
let B be the union of the sets A;, k € S. Then 


1 


f, = —— 
É l—ea 


I 
YOP, (Ax) — PA) = —— P (B) — Sra] 


kes l—a kes 


and it follows that t, converges, a contradiction. Thus P is a probabil- 
ity measure. If B; € F, B} | Ø, then given e > 0, we have P(A;,) < € 
for large k, say, for k > ko. Thus P, (An) < £ for large n, say, for 
n > no. Since the A; decrease, we have sup,,..,,, Pn(Ax) < € for k > 
ko, and since A; | Ø, there is a kı such that for n = 1,2,...,n9— 
1, Pa (Ay) < € for k > kı. Thus sup, Pa (Ay) < £, k > max(ko, kı). 

(b) Without loss of generality, assume P, (Q) < 1 for all n. Add a point 
(call it oo) to the space and set P,, {oo} = 1 — P, (Q) > 1 — P(Q) = 
P{oo}. The P,, are now probability measures, and the result follows 
from part (a). 


Section 1.6 


2. fo Dra faldu = or, fo fn} du < œ; hence 37°" | | fn | is integrable 
and therefore finite a.e. Thus }>™ , fn converges a.e. to a finite-valued 
function g. 

Let g&n = Xz; fx. Then |g,| < Xz |fx|, an integrable function. By 
the dominated convergence theorem, fe 8n du — fo gdp, that is, 


© fad [LS fact 
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3. 


Let xo € (c, d), and let x, —> xo, x, Æ xo. Then 


] b b 
G x pi fondy- | fo y)dy 
-f 102 y) 


Xn — XO 


| ay. 


By the mean value theorem, 


Í On, y) _ f œo, y) 
Xn — XQ 


= fin, Y) 


for some A, = Àn (y) between x, and x9. By hypothesis, |f; (An, y)| < 
h(y), where A is integrable, and the result now follows from the dominated 
convergence theorem (since [f Œn, y) — f (xo, y)I/Pen — x0] > fı, y), 
f@, -) is Borel measurable for each x). 


Let u be Lebesgue measure. If f is an indicator Ig, B € (R), the result 
to be proved states that 4(B) = u(a + B), which holds by translation- 
invariance of u (Problem 4 in 1.4). The passage to nonnegative simple 
functions, nonnegative measurable functions, and arbitrary measurable 
functions 1s done as in 1.6.12. 


Section 1.7 


2. 


(a) 


If f is Riemann—Stieltjes integrable,a = f = p a.e. [u] asin 1.7.1(a). 
Thus the set of discontinuities of f is a subset of a set of u-measure 0, 
together with the endpoints of the subintervals of the P}, Take a dif- 
ferent sequence of partitions having the original endpoints as interior 
points to conclude that f is continuous a.e. [4]. Conversely, if f is 
continuous a.e. [u], then a = f = B ae. [u]. [The result that f is 
continuous at x implies a(x) = f(x) = B(x) 1s true even if x is an 
endpoint.] As in 1.7.1(a), f is Riemann—Stieltjes integrable. 

This is done exactly as in 1.7.1(b). 


By definition of the improper Riemann integral, f must be Riemann 
integrable (hence continuous a.e.) on each bounded interval, and the 
result follows. For the counterexample to the converse, take f(x) = 
l,n<x<n-+l1,n an even integer, f~)=—-l,n<x<n4+1,n 
an odd integer. Then the limit of ra, (f ) does not exist. (Alternatively, 
take f(x) identically 1; then ra (f) > +œ as a > —œ%, b > œ.) 


(b) Define 


fats) = ff if—n<x<n, 


elsewhere. 
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Then f, t f; hence f is measurable relative to the completed o- 
field; also fo fn du t fa f du by the monotone convergence theo- 
rem. But fo fn dH = T-n nlf) by 1.7.10), and r-n n (f) > r(f) by 
hypothesis; the result follows. 

For the counterexample, take 


(—1)" 
, n<x<n-+1, n =0,1,...., 
fax)=<4 n+1 
0, x< Q. 
We have r(f) =1—3+%4-—---, but 
[ifiduait b+ bt =00, 
R 


so that f is not Lebesgue integrable on R. 
CHAPTER 2 
Section 2.1 


2. LetD={w: f(w) < O0}; then à (A N D) < 0,å (A O D°) > O forall A € F. 
By 2.1.3(d), 


A)=AAND)= ff dy 
A 
since f* = f on D° and f* =0 on D. Similarly, 
A= AAD) = | fd 
A 


since f~ = — f on D, and f~ = Q on D°. The result follows. 
4. If E;,..., En are disjoint sets in Z, with all E; C A, 


PED = > PE) -A Ed < DBE) +2 EN 
i=] 


i= | 
= w (Uzi) = men 
i= | 


Thus the sup of the terms Soret |A(E;)| is at most |A|(A). But 


i= | 


AIA) = A7 (A) +47 (A) 
= (AND) — (AND) 
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= |A(AN DOAI + |ACAND)| 
since A(A NANA D) > 0 and MAND) <0 
= |A(E))| + |ACE2)| 


and the result follows. 
Section 2.2 


2. Let A, = {w: |g(@)| > 1/n},n =1,2,..., so that A =U, An. Now 


1 
00 > J lg|du > — u(n); 
Ay n 


hence u(A„) < oo. For the example, let u be Lebesgue measure on .# (R), 
and let g(x) be any strictly positive integrable function, such as g(x) = 
el, In this case, A = R, so that w(A) = œœ. 


4. If f is an indicator /,, the result is true by hypothesis. If f 1s a nonnegative 
simple function aI xjI,4,, the A; disjoint sets in %, then 


Jta a= | edu = Sox; | Igda 
j=l j=! J j=! 


= J fgdu by the additivity theorem. 
Q 


If f is a nonnegative Borel measurable function, let f1, f2,... be non- 
negative simple functions increasing to f. By what we have just proved, 


fo fndà = fo fn gdu; hence fa f dà = fa fgdu by the monotone con- 
vergence theorem. Finally, if f is an arbitrary Borel measurable function, 
write f = f* — f`. By what we have just proved, 


[ram f Pein | fdr= f Fedu 


and the result follows from the additivity theorem. 


6. (a) Inthe definition of |A|, we may assume without loss of generality that 
the E; partition A. If A is the disjoint union of sets A4, Az,..., E F, then 


< SOS AE NADI 


j=l i=] 


n n 


PEN =>) 


j=l J=] 


= YOYO IAE; NADI < YO AAD. 
i=] 


i=l j=) 


OO 
SME; Ai) 
i=] 
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(b) 


(c) 
(d) 


(e) 
(£) 
(g) 
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Thus |A|(A) < X, |A|(A;). Now to show the reverse inequality, we 
may assume |A|(A) < oo; hence |A|(A;) < |A|(A) < oo. For each i, 
there is a partition {E;,,..., Ejin,} of A; such that 


ft, e 
SIA) > AAD — 57, £>0 preassigned. 
J=] 


Then for any n, 


JA|(A) > D> AEI = 3 |A|(Aj) — €. 


i=] Jol 


Since n and € are arbitrary, the result follows. 
If E,,...,E, are disjoint measurable subsets of A, 


NCA HADE) < Soa EDI + YL AE) 
i=] i=l] i= | 


< [Aj |(A) + |A2|(A), 


proving |à; + A2| < lài] + |A2|; |aA| = Jal |A| is immediate from the 
definition of total variation. 

If u(A;) = 0 and |A;|(AS)= 0, i = 1, 2, then a U A>) = 0 and by 
(b), |Ay + Aa|(AS NAS) < [Ar (AS) + [Az] (AS) = 

This has been established when A is real (see > 2. 5), so assume A 
complex, say, A = A, + iA2. If w(A) = 0, then A, (A) = A2(A) = 0 
hence A << u implies à; << wanda, << pw. By 2.2.5(b), [Ài] << 
LL, |A2| << u; hence by (b), |A] << u. The converse is clear since 
JA(A)| < |A|(A). 

The proof is the same as in 2.2.5(c). 

See 2.2.5(d). 

The “1f” part is done as in 2.2.5(e); for the “only if” part, let .(A,,) > 
0. Since |A| << u by (d), |A|(A,) —> O by 2.2.5(e); hence A(A,,) > 0. 


Section 2.3 


2. We have 


b — 
fron- f in [E80] a 


h 
where we may assume h > 0 


b — 
< lim inf | parm d 


h—>0 h 
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Eaa l b+h ath 
= lim inf 7, | A f(x)dx — J f(x) dx 

[define f(x) = f (b), x>b, f&œ)= f(a) x<a] 
< lim inf Uh f(b+h) —hf ial since f is increasing 
= f(b) — f(a). 


Alternatively, let u be the Lebesgue—Stieltjes measure corresponding to 
f (adjusted so as to be right continuous), and let m be Lebesgue measure. 
Write u = u + u2, where u; << m and u Lm. By 2.3.8 and 2.3.9, 


b b 
[ seas | fi dm 
b 
= | duan 


= | Him = Hla, b] < pla, b] = f (b) — f (a). 


6. (a) Since A is linear and m is translation-invariant, so is A; hence by 
Problem 5 in 1.4, à = c(A)m for some constant c(A). Now if A, and 
A> are linear transformations on R*, then 


m{A|A2E) = c(Ay )m(A2E) = c(A 1 )c(A2 )m(E) 


and 
m(A,A2E) = c(A;A2)m(E); 


hence 
c(A,A2) = c(A) )c(A2). 


Since det (A;Az) = det A; det A>, it suffices to assume that A cor- 
responds to an elementary row operation. Now if Q is the unit 
cube {x: 0 <x, <1,i=1,...,k}, then m(Q)=1; hence c(A) = 
m(A(Q)). If e},..., eg is the standard basis for R*, then A falls 
into one of the following three categories: 

(1) Ae;=e;, Ae; =e;, Aeg=ex, KAi or j. Then c(A)=1 

= |det Al. 
(2) Ae; = ke;, Ae; = e;, j #1. Then c(A) = |k| = |det Al. 
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(b) 


(c) 
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(3) Ae; =e; + €j, Aeg = eg, k # i.Thendet A = 1andc(A)is 1also, 
by the following argument. We may assume i = 1, j = 2. Then 


k 
nO = AAY ae: 0<aq;<1, i= hoot 


i=] 
k 
= {y= Yhes 0<b <1, i4#2, b <h sbt] 


If B; = {y€ A(Q): bo < 1}, Bo = {y € A(Q): bo > 1}, and Bp 
— e3 = {y — e2: y € By}, then Bı O (B2 — e2) = Ø; for if y € 
B,, then b; < bz and if y+ ez € B C A(Q), then bz +1 < 
b, + 1. Therefore, 


c(A) = m(A(Q)) = m(B,) + m(B2) = m(B,) + m(B2 — e2) 
= m(B, U (B2 — e2)) = m(Q) = 1. 


Fix xeV and define S(y) = Ta +y) —T(Q), y € {z —x: zE V}. 
Then if C is a cube containing 0, we have 


Tix+C)={T@sty): yech}=Taw)+{Sty): yec} 
= T(x) + S(C). 


Therefore, m(T (x + C)) = m(S(C)), so that differentiability of T at 
x is equivalent to differentiability of S at 0, and the derivatives, if 
they exist, are equal. Also, the Jacobian matrix of $ at O coincides 
with the Jacobian matrix of T at x, and the result follows. 

Let A = A(O), and define S(x) = A~!(T(x)), x € V: the Jacobian ma- 
trix of S at O is the identity matrix, and S(0) = A7! (0) = 0. If we 
show that the measure given by m(S(E)), E € .@(V), is differentiable 
at O with derivative 1, then 


PEO) 7 1 ` € 
m(C) idet Al 


for sufficiently small cubes C containing 0. Thus by (a), m(T(C)) 
= m(AS(C)) = |det Alm(S(C)); hence 


m(S(C)) _ 


m(T(C)) 
m(C) 


~|de Al) = ora 
m(C) 


1| Idet A <£, 


so that u is differentiable at O with derivative |det A|. 
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(d) 


(i) If xeC, then |x| < /kB <5; hence |T(x)—x| <afB= 


+(Bo — B). Therefore T(x) € Co. 


ai) If x € dC, then |T(x)— x| < @B as above, and af = +(B — B;). 


Therefore T(x) ¢ Cı. 


(11) We have |T(x) — x| < af, and 


| 
c — 
pi < 4 


l—2& 1— 


1 
œp = pi = 5 Pi: 


1 
2 


and the result follows. 


(iv) Ify € Cı —T(C)buty € T(C), then y € T(8C), contradicting (ii). 


(e) 


Now C, is the disjoint union of the sets C; AO T(C) and C; — T (C); 

the first set is open since T is an open map, and the second set is open 

by (iv). By Gai), Cı O T(C) Æ Ø, so by connectedness of Cı, we have 

Cı = Cı Q T(C), that is, Ci c T(C). Also, T(C) C C2 by 0). 
Therefore m(C ,) < m(T(C)) < m(C>), that is, 


(1 — 2a)* p* < m(T(C)) < 14+ 20) p. 
Thus 1—¢e < (1 — 2a) < m(T(C))/m(C) < 1 +2a) <1+e 


as desired. 
Assume m(E) = 0 and A(E) > 0. By 1.4.11, and the fact that 


E = GEN, s Gaal, 


n j=l C, F n ) 


we can find a compact set K and positive integers n and j such 
that m(K) = 0, A(K) > 0, and A(C) < nm(C) for all open cubes C 
containing a point of K and having diameter less than 1/7. If e > Q, 
choose an open set D D K such that m(D) < e. 

Now partition R* into disjoint (partially closed) cubes B of diame- 
ter less than 1/ j and small enough so that if BN K Æ Ø, then B C D. 
If the cubes that meet K are B,,...,B;, we may find open cubes 
C: > B;,1<i<t, such that m(C;) < 2m(B;) and diam C; < 1/7. 
Then 


MK) < DOE < Due. yen yom \ < 2n S` mB.) 


i=] 
< 2nm(D) < 2ne. 


Since £ 1s arbitrary, A(K) = 0, a contradiction. 
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If f is an indicator Jz, let E = T7! (B), so that B = T(E). Then 


[ foray = mre» and [ tae@peoidx = | Vds 


and the result follows from part (e). The usual passage to simple 
functions, nonnegative measurable functions and arbitrary measur- 
able functions completes the proof. 


Section 2.4 


l. 


(a) 


(b) 


If f = {a),...,@,,0,0,...}, then fa f du = $3]; a by definition 
of the integral of a simple function. If f = {a,,n = 1, 2,...}, with 
alla, > 0, fa f du = 07, an by the result for simple functions and 
the monotone convergence theorem. If the a, are real numbers, then 
Jo f du = Jo f% du — Jo fF du = Xn- dp — Dina 0p if this is 
not of the form +00 — oo. Finally, if the a, are complex, 


OO OO 
| fuy Rea, +i>— Im a, 
S2 n= | n=] 


provided Re f and Im f are integrable; since |Re an|, |Im a,| 
< |a,| < |Re an| + |Im a,|, this is equivalent to S77", |an| < œ. 
If f(a) = 0 except for a € F, F finite, then fa f du = >>, fla) by 
definition of the integral of a simple function. If f > 0, then fẹ f du 
> fp f du for all finite F; hence fa f du > >>, f(a). If f(a) > 0 for 
uncountably many æ, then 5°, f(a) = œ; hence fa f du = œ also. 
If f(a) > 0 for only countably many a, then fa f du = X, f(a) by 
the monotone convergence theorem. The remainder of the proof is 
as in part (a). 


Apply Hélder’s inequality with g = 1, f replaced by |f V, p = s/r, 1/q 
= 1—r/s, to obtain 


[irra 2 (fired) ( fia)” 


Therefore || f |l- < If lls fu(Q)]'”%, as desired. 


We have fo |f |? du < If Idou(Q), so lim sup, .5 If llp < If llo: Now 
let € > 0, A = {w: |f| > || fll — £} (assuming || f lle < oo). Then 


[ |f|? du > J FIP du > (if lloo — £)’ u (A). 
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Since (A) > 0 by definition of || fllo, we have liminf,.. || fll, 
> Ifl If Ifilo = œ, let A = {@: |f @)| = M} and show that 


liminf || f || p > M; 
prc 


since M can be arbitrarily large, the result follows. 
If (Q) = on, it is still true that liminf,_.. |f ll = Illo; if w(A) 
= © in the above argument, then || f||, = 00 for all p < oo. However, if 
u is Lebesgue measure on 2 (R) and f(x) = 1 forn <x <n+(1/n), 
n=1,2,..., and f(x) = 0 elsewhere, then ||f||, = oo for p < ov, but 
IF lloo = 1. 
11. (a) We have |f,(f —z)du| < flf —zldu. But w€E implies 
f (@) € D; hence |f (w) — z| < r; and thus fp |f —z| du < ru(E). 
If w(E) > 0, then 


1 l 
— | fda- =|—— | ¢ -da 
ow S we z5 fe z) du 


hence [1/u(E)] J, f du € D C $°, a contradiction. Therefore u(E) 
= Q, that is, u{æ: f(w) € D} =0. Since {w: f(w) ¢ S} is acount- 
able union of sets f~! (D), the result follows. 

(b) Let u = àl; if Ei, ..., En are disjoint measurable subsets of A,, 


f hau < f hidu 
E; ja CES 


<r) wE;) < ru(A,). 
J=] 


SY; 


r n 


SAE, =Y 


j=l j=l 


Thus u(A,) < ru(A,), and since 0 <r < 1, we have u(A,) = 0. If 
A = fæ: |h(w)| < 1} = U{A-: 0 <r < 1,r rational}, then (A) = 
0, so that |A| > 1 ae. 

Now if u(E) > 0, then[1/u(E)] fz hdu = A(E)/ (E) € S, where 
S={zeEC: |z| < 1}. By (a), hlæ) € S for almost every w, so |A| < 1 
a.e. [|A|]. 

(c) If EEF, fj Ighd|A| = f,hd|a| = AZ) by (b); also, fẹ eg du 
= fp gdu = X(E) by definition of A. It follows immediately that 
Ja fhd|A| = fa fgdu when f is a complex-valued simple func- 
tion. If f is a bounded, complex-valued Borel measurable function, 
by 1.5.5(b) there are simple functions f, > f with |f,| < |f|. 
By the dominated convergence theorem, fo fhd|A| = fo fg du. If 
f =hlg, we obtain |A|(E) = f- hg du. 
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(d) In (c), |A|(E) > 0 for all E; hence hg > 0 a.e. [u] by 1.6.11. But 
if g(w) = |g(@)le" and h(w) = e0, then e'@- = 1 ae. on 
{e Æ O}, so that hg = |g| a.e., as desired. 


12. If Z(q») can be approximated in L™ by continuous functions, let 0 < £ < 
} and let f be a continuous function such that 
F(ab) — f llo < £: 


hence |/ (a,b) — f| < € a.e. For every ô > 0, there are points x € (a, a + 
ô) and y € (a — 4, a) such that |1 — fœ)| < £ and |f(y)| < £. Conse- 
quently, lim sup, ,,+ fœ) > 1—e and lim infa- f(x) < £, contradict- 
ing continuity of f. 


Section 2.5 


4. Let Bus = {| f; — fkl = ô}, Bs = May Uj k=n Bis. Then 


OO 
| Bis | Bs, 


j k=n 
and the proof proceeds just as in 2.5.4. 


5. Let {f,,} be a subsequence converging a.e., necessarily to f by Problem 1. 
By 1.6.9, f is p-integrable. Now if fẹ fandu +> Ja f du, then for 
some £ > 0, we have |fe fn du — fo f du| > efor n in some subsequence 
{m,}. But we may then extract a subsequence {f,,} of {fm,} converging 
to f a.e., so that fo fr, du —> fo f du by 1.6.9, a contradiction. 


Section 2.6 


4. By Fubini’s theorem, 


wc = ff tedu= | | tedun dp =J u2 (C (Œw )) du (a). 
O Í 2 ] 


Similarly, u (C) = Jo, ui (C(a@.)) dux(@). The result follows since f > 
0, fa f = 0 implies f = 0 ae. 
7. (a) Let 


k-—1 k 
Ant = xE: — < fx) <- 5 
n n 


k-—1 k 
H? n 
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(n= 1,2,...,k =1,2,...,n; when k = n, include the right end- 
point as well). Then 


G = N JAn x Bp EF. 
k=] 


k 


n = 


If f is only defined on a subset of §2,, replace (2; by the domain of 
f in the definition of Ang. 

(b) Assume B c C,. Each x € Q; is countable and (x, y) € B implies 
y < x; hence there are only countably many points y,,, Yx2, ... € Q2 
such that (x, Yxn) € B. (If there are only finitely many points y,,,..., 
Yen, take yu = Yxn for k > n.) 

Thus B = |J, Ga, where G, is the graph of the function f, 
defined by f,(*) = ym. By part (a), Be F. (Note that y,, € Qo, 
which may be identified with [0, 1]; thus (a) applies.) If B c Co, 
each y € 925 1s countable, and (x, y) € B implies x < y, so there are 
only countably many points x,, € Q, such that (xn, y) € B. The 
result follows as above. 

(c) IF c Q, then F = (FNC,)U CF NC?) and the result follows from 
part (b). 
Assuming the continuum hypothesis, we may replace [0, 1] by the first un- 
countable ordinal £,;. Thus we may take Qi = Q: = bi, Fi =F 
= the image of the Borel sets under the correspondence of [0, 1] with 
Bi, and jz; = u2 = Lebesgue measure. Let f = Ic, where C = {(@, y) € 
OQ, x QR: y <x} and the ordering “<” is taken in £, not [0, 1]. For 
each x, {y: f(x, y) = 1} is countable, and for each y, {x: f(x, y) = 0} is 
countable; it follows that f is measurable in each coordinate separately. 


Now h f(x, y)dy = 0 for all x; hence h So f(x, y) dy| dx = 0. But 


fou — f(x, y)]dx =0 for all y; hence ik fe f(x, y)dx| dy=1. It 
follows that f is not jointly measurable, for if so, the iterated integrals 
would be equal by Fubini’s theorem. 


Section 2.7 


l. 


Let ¥ be the smallest o-field containing the measurable rectangles. Then 
€ c II- A; since a measurable rectangle is a measurable cylinder. But 
the class of sets A C [[_, Q; such that {w E€ Q: (@,...,@,) EAE F 
is a o-field that contains the measurable rectangles of IÉ- Q ;, and hence 
contains all sets in [];_ ,.%;. Thus all measurable cylinders belong to ¥, 
so [l Fj) CF. 
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5, , > 
{fx Ee RW: f(x) = n} = fx Xox <1, k=1,2,...,n- 1, Š x > 1} 
i=] i=l 
if n=1,2,..., 
{x e R”: f(x) = œ} = fx Sx; <l, n= L2} 
i=] 
In each case we have a finite or countable intersection of measurable 
cylinders. 
CHAPTER 3 
Section 3.2 


6. (a) Letz € Ķ, a compact subset of U. If r < do, a standard application 
of the Cauchy integral formula yields 


l on | 
f= z f(z +re®)do. 
27T 0 
Thus if 0 < d < dọ, 


d? d 1 d ŽI l 
f2) =| rf(z)dr= | rdr | f(z + re®)do, 
2 0 27 Jo 0 


OT 


jl 2y d 
f) = — | f(z+re”)rdr dé 
zd? Jo-0 Jr=0 


1 
— -z || f+ pdxdy 
D 


where D is the disk with center at z and radius r. An application 
of the Cauchy—Schwarz inequality to the functions 1 and f shows 
that 


1 
HOE -g d’) IS 


as desired. 
(b) If fi, f2,... isa Cauchy sequence in H(U), part (a) shows that f, 
converges uniformly on compact subsets to a function f analytic 
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on U. But H(U) c L*(Q, F, u) where Q = U, Z is the class of 
Borel sets, and yz is Lebesgue measure; hence fn converges in L? 
to a function g € H (U). Since a subsequence of {f,,} converges to 
g a.e., we have f = g a.e. Therefore f € H(U) and f, —> f in L’, 
that is, in the H(U) norm. 


7. (a) [fO<r< 1, 
1 2yr l l OT | } 
_ | f (re’”’)|? dð = z) danre" D amr" dð 


27t 0 
l [ | 
_ — ontm i(n—m)@ 
= — X an Ant e dé 
2m n m noom 0 


since the Taylor series 
converges uniformly on compact 
subsets of D 


OO 
— ` lan ee 
n=O 


. . Oxo . 
which increases to 5>”° , |a,|? as r increases to 1. 


| 200 
© ff ifatinedxdy= | rar [poe )Pa0 
0 0 
D 


] 
< 2N?) | rdr =nN° (f). 
0 


20 | On 
c „æ+ iyd dy = | f 2n t dr dô = —— —> 0, 
© Jj e yPdxdy= | | rdrdo= >Z 


but 
1 2x , 1 20 
| fn (re) dO = al r” d@ =r"; 
ON 0 on 0 


hence N(f,) = 1 for all n. 


(d) If {f,} is a Cauchy sequence in H’, part (b) shows that {fn} 1s 
Cauchy in H(D). By Problem 6, f, converges uniformly on com- 
pact subsets and in H(D) to a function f analytic on D. Now if 


476 SOLUTIONS TO PROBLEMS 


0 < ro < 1, £ > 0, the Cauchy property in H4 gives 


20 
— | |fntre) — fmre)? dO <e 
27r 0 


for r < ro and n, m exceeding some integer N(e). Let m > oo; 
since fm — f uniformly for |z| < ro, 


l 2 . l 
5 | fare) freh dose, r<ro  n2NGe) 
0 


Since ro may be chosen arbitrarily close to 1, N’(f, — f) < e for 
n > N(e), proving completeness. 

(e) Since e, corresponds to (O,...,0,1,0,...), with the 1 in position 
n, in the isometric isomorphism between H4 and a subspace of 
1°, the e, are orthonormal. Now if fe H?, with Taylor coeffi- 
cients a,, n =0,1,..., then (f,e,) = an, again by the isometric 
isomorphism. Thus 


CO OO 
N?) = X lanl? = So Mf. endl? 
n=0 =0 
and the result follows from 3.2.13(f). 
(£) (en, €m) = T Enx + Ly )Em(x + ty) dx dy 


D 
= JJ e, (re? Yem(re®)r dr dO 
D 


5 l 200 | . 
= [(2n + 2)(2m + 2)]'/" (5) | | romen; dy dQ 
IT 0 0 


1, n =m. 


_ tt n Æm, 
Thus the e, are orthonormal. Now if f € H(D) with 


f (2) =X anz". 
n=0 
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then 
25 ro 
J J f (re? ye, (re’’ )r dr dé 


20 P) 9) 1/2 l 
-Xa f J É ero (ome m+ ) re). dr d@ 


since the Taylor series converges uniformly on compact 
subsets of D 


2 2m+2 1/2 
— a, 2m +2 1/ ro + on = a, on / m2 
2t 2m +2 2m+2 


Now fé,, is integrable on D (by the Cauchy—Schwarz inequali- 
ty), so we may let rg — 1 and invoke the dominated convergence 
theorem to obtain 


But the same argument with e,, replaced by f shows that 


20 
If lo = = Jim È Apn in | J» n e"? re m0 e dr d@ 


-Y lan | 27r 
p n=0 2n +2 


The result now follows from 3.2.13(£). 


Let g be a continuous complex-valued function on [0, 277] with g(Q) 
= g(27). Then g(t) = h(e”), where A(z) is continuous on {z € C: 
|z| = 1}. By the Stone— Weierstrass theorem, h can be uniformly 
approximated by functions of the form $>7__, Ck z*. For the algebra 
generated by z”, n = 0, +1, +2,..., separates points, contains the 
constant functions, and contains the complex conjugate of each of 
its members since Z = 1/z for |z| = 1. 

Thus g(t) can be uniformly approximated (hence approximated in 
L?) by trigonometric polynomials S~?__, cge. Since any continu- 
ous function on [0, 27r] can be approximated in L4 by a continuous 
function with ¢(0) = 9(27r), and the continuous functions are dense 
in L7, the trigonometric polynomials are dense in L}. 
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10. 


(b) 


(c) 


(a) 


(b) 
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By 3.2.6, y7 IFO — Yip, cx ee |? dt is minimized when ck = ax 
= (1/27) 7 fie" dt. Furthermore, some sequence of trigono- 
metric polynomials converges to f in L? since the trigonometric 
polynomials are dense. The result follows. | 

This follows from part (a) and 3.2.13(c), or, equally well, from part 
(b) and 3.2.13(d). 


Let {e,, } be aninfinite orthonormal subset of H.TakeM = {x,, x2, ...}, 
where x, = (1+ 1/n)e,,n = 1, 2,....To show that M is closed, we 
compute, for n Æ m, 


1 1 
Xn — Xml? = | (1 + =) En — (1 + =) Em 
n itt 
1\? 1\? 
—-{1+—]) +/14+—] >2. 
it itt 


Thus if y, E M, Yn — y, then y, = y eventually, so y € M. Since 
lx l? = 1+ (1/n), M has no element of minimum norm. 

Let M be a nonempty closed subset of the finite-dimensional 
space H. If x € H and N =M Af{y: |x — yl] < n}, then N Æ Ø for 
some n. Since y > ||x — yll, y € N, is continuous and N is compact, 
inf{||x — yll: y € N} = |lx — yoll for some yo € N C M. But the inf 
over N is the same as the inf over M; for if y€ M, y ¢ N, then 
lx — yoll < n < |x — y||. Note that yo need not be unique; for ex- 
ample, let H = R, M = {-1, 1}, x = 0. 


2 
| 
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3, 


C b Gd Š a Gd + 
Since J, IK(s, t)| dt 1s continuous in s, 1t assumes a maximum at some 


point u € [a, b]. If K(u, t) = r(te'™, r(t) > 0, let z(t) = eO, Let x, 
X2,... be a sequence in C[a, b] such that f? |x, (t) — z(t)| dt > 0; we 


may assume that |x, (f)| < 1 for all n and ¢ (see 2.4.14). Since K is 
bounded, 


b 
J K(s, t)z(t) dt 


b 
= lim J K (s, t)x„ (t) at 


= lim |(Axn)(s)| < IIA- 


Set s = u to obtain 


b 
|K (u, t)| dt < |All, 


a 


as desired. 


CHAPTER 3 479 


7. 


11. 


(a) Ifx € L, then 


n 
| N xie; 
i=] 


lxh = 


n n 
i < $ lw eili < (max eis ) Yo bs 
i=] i=] 


But Soy, |x| < /n||x|l2, so we may take k = y/n max; |le;||1. 

(b) Let S = {x: ||x||2 = 1}. Since (L, || ||2) is isometrically isomorphic 
to C”, S is compact in the norm || |2. Now the map x —> ||x||; 1s 
a continuous real-valued function on (L, || ||1), and by part (a), the 
topology induced by || ||; 1s weaker than the topology induced by 
| lz. Thus the map is continuous on (L, || ||2); hence it attains a 
minimum on $, necessarily positive since x € S implies x Æ 0. 

(c) If x eL, x#0, let y=x/||x|l2; then |] y||1 > mllyll2 by (b), hence 
Ixl > m||x|l2. By (a) and Problem 6(b), || ||; and || |l2 induce the 
same topology. 

(d) By the above results, the map T: Soy, xje; > (%,..-,%n) is a 
one-to-one onto, linear, bicontinuous map of L and C” [note that 
| So) Xe: ll2 is the Euclidean norm of (x, ...,X,) in C”]. If y; € L, 
y; > y € M, then y; — yk —> O as j,k > o0; hence T(y; — yk) = 
Ty; —Ty, — 0. Thus {T y;} is a Cauchy sequence in C”. If Ty; > 
z € C”, then y; > Tz EL. 


For (a) implies (b), see Problem 7; if (c) holds, then {x: ||x|| < £} is 
compact for small enough ¢ > 0; hence every closed ball is compact (note 
that the map x — kx is a homeomorphism). But any closed bounded set 
is a subset of a closed ball, and hence is compact. 

To prove that (f) implies (a), choose x; € L such that ||x || = 1. Suppose 
we have chosen x,,...,x, € L such that |[x;|| = 1 and ||x; — x;|| > 4 for 
il,j=1,...,k,i4 j. If Lis not finite-dimensional, then S{x,,...,x,}1sa 
proper subspace of L, necessarily closed, by Problem 7(d). By Problem 8, 
we can find x,4,; € L with ||x,+,|| = 1 and ||x; — x+ || > L, i=1,...,k. 
The sequence x,, x2, ... satisfies ||x, || = 1 for all n, but ||x, — Xm || > 5 L for 
n Æ m; hence the unit sphere cannot possibly be covered by a finite number 
of balls of radius less than E 


(a) Define A(A) = f (l4), A€ F. If Ai, A2, ... are disjoint sets in ¥ 
whose union is A, then A(A) = 5)", f (Ia;) since f is continuous 


LP 
and Y] lA, ——> la. [Note that 


[> 


P 


du = S5 ua 0 


i=n+] 
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by finiteness of jz.] Thus A is a complex measure on .¥. If ~(A) = 0, 


then Z4 = 0 a.e. [u], so we may write I, * Q and use the con- 
tinuity of f to obtain A(A) = 0. By the Radon—Nikodym theorem, 
we have A(A) = f, ydu for some y-integrable y. Thus f(x) = 
fa xy du when x is an indicator; hence when x is a simple function. 
Since f is continuous, y is jz-integrable, and the finite-valued sim- 
ple functions are dense in LP, the result holds when x is a bounded 
Borel measurable function. 

Now let y1, y2, ... be nonnegative, finite-valued, simple functions 
increasing to |y|. Then 


yall? = J yida < Í y1- |y] du = J yale are Yy dy 
0) $ 2? 
= f(y? te! 78 Y) since yT le™ %8 Y is bounded 


< IAI lp = FM iyl? since (q4-Dp=q. 


Thus |lynll < |f ||; hence by the monotone convergence theorem, 
lo < IF ||; in particular, y € L2. But now Hélder’s inequality and 
the fact that finite-valued simple functions are dense in L? yield 
f(x) = iP xy dp for all x € LP. Hélder’s inequality also gives || f || 
< |[yllg; hence || f || = ||yllz- If yı 1s another such function, then g(x) 
= fa x(y— y,)du =0 for all x € L”. By the above argument, 
ly — yill =9, so y = yy ae. [y]. 


(b) @ If, say, ya — yg > 0 on the set F C AAB, let x = Ip; then xla 


= xIp; hence fa x(ya — ya) du = 0, that is, 


fo — yg)du = 0. 
F 


But then u(F) = 0. 


(11) Since ya U Ám = ya, ae. on A, we have 


lya, l2 < lya, U Amol? = llya, ll + J iya, | du. 


m n 


Since || ya, || approaches k? as n — oo, so does || ya, U Am |lġ- and 
it follows that Jana, lya,, |? du —> 0 as n > oo. By symmetry, 
we may interchange m and n to obtain 


J (YAn — Ya, |" du = J Iva, |? du 
2 Am—An 


+ | lya, |? du > 0 
A, Ay 
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as n,m — oo. Thus ya, converges in L? to a limit y, and since 
Ilya, lla < fl] for all n, |lyllo < Ilf ||- If {B,} is another sequence 
of sets with || yz, ll > k, the above argument with A,, replaced 
by B, shows that || ya, — yz, ||, —> 0; hence yg, —> y also. 


(iii) Let A €.¥, (A) < oo. In (ii) we may take all A, D A, so that 


ya, = ya ae. on A; hence y= yy ae. on A. Thus if x = I4, 
then f(x) = f (xla) = iP xyąa du = iP xy du. It follows that f(x) 
= fa xydwu if x is simple. [If w(B) = co, then x must be 0 on 
B since x € L?.] Since y € L4, the continuity of f and Hölder’ s 
inequality extend this result to all x € LP. 


(c) The argument of (a) yields a -integrable y such that f(x) = Ja xy 
du for all bounded Borel measurable x, Let B = {w: | y(w)| > ky; 
then 

ku(B) < J |yldu = J Ige 8 du 
B Q 
= f (Ige™ 8 >) < [fI Walhi = If lle). 
Thus if k > || f ||, we have u (B) = 0, proving that y € L™ and || ylle 
< |[f ll. As in (a), we obtain f(x) = fẹ xy du for all x € L', |If| 
= |lyllə, and y is essentially unique. 

(d) Part (i) of (b) holds, with the same proof. Now if Q is the union 
of disjoint sets A,, with p(A,) < oo, define y on Q by taking 
y = ya, on A,. Since |ly,, lloo < || f I| for all n, we have y € L” 

l 
and |lylloo < If Il. If x € L', then X5}; xla, —, x; hence 
OO OO 
fx) = feln) = S | avn, du 
n=] n=] 
OO 
=> f wydu= | xydu, 
n=] An x2 
Since || fl < |l¥llo by Hélder’s inequality (with p= 1, q = œ), 
the result follows. 
Section 3.4 
3, Let {y,} be a Cauchy sequence in M, and let xg be any element of L 
with norm 1. By 3.4.5 (c), there is an f € L* with || f|| = 1 and f(x) 


= |xoll = 1; we define A, € [L, M] by A,x = f(x) yn. Then ||(A, — Am Xxl 
= |F) lyn — Yml < lyn — ymll [xl], so that |A —Am|l < IYn — yall 
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— Q0. By hypothesis, the A, converge uniformly to some A € [L, M ]; there- 
fore, || yn — Axoll = lln xo — Axoll < [An — All > 0. 

Let L be the set of all bounded scalar-valued functions on 2, with sup 
norm, and let M be the subspace of L consisting of simple functions 


xX = Sx; I4,, 
j 
where the A; are disjoint sets ın 2g. Define g on M by 


g(x) = X x; W0(A)j). 
j 


Now|g(x)| < max; |x;|ġ_; Ho(A;) < max; |x;luo (Q2) = o(Q)||x||; hence 
lell < Uo(£&2) < oo. By the Hahn—Banach theorem, g has an extension 
to a continuous linear functional f on L, with || f|| = lell. Define u(A) 
= f (Ia), A C Q. Since f is linear, u is finitely additive, and since f is an 
extension of g, u is an extension of Uo. Now if u(A) < 0, then 


faa) = pA‘) = u(Q) — A) > uQ). 


But ||Z4¢|| = 1, so that || f || > ~(Q), a contradiction. 


Since A,x — Ax for each x, sup, ||A,x|| < oo. By the uniform bounded- 
ness principle, sup, ||A,|| = M < œœ; hence 


lAx|| lim |[A,x|| = lim inf ||A,x| 
hE —> OO hi —> OO 


< |[x|| lum inf ||A, || < M lix. 
hi —> ©O 


Let L be the set of all complex-valued functions x on [0, 1] with a continu- 
ous derivative x’, M the set of all continuous complex-valued functions 
on [0, 1], with the sup norm on L and M. If Ax = x’,x € L, then A 
is a linear map of L onto M, and A is closed. If x, — x and x,’ > y, 
then since convergence relative to the sup norm is uniform convergence, 
h Xn (s)ds > h y(s) ds = z(t). Thus x, (t) — x,(0) > z(t); hence x(t) — 
x(0) = z(t). Therefore x’ = z’ = y. But A is unbounded, for if x, (t) = 
sin nt, then ||x, || = 1, ||Ax, || — oo. 


By the open mapping theorem, A is open; hence A{x € L: ||x|| < 1} is a 
neighborhood of 0 in M; say, {y € M: |ly|| < 6} C A{x e L: |l|x|| < 1}. 
If ye M, y #0, then dy/2|| y|| has norm less than 6, hence equals Az for 
some z € L, ||z|| < 1. If x = (|| y||/8)z, then Ax = y and ||x|| < 2||y|l /9. 
Thus we may take k = 2/6. 
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Section 4.8 


4. To prove (a), let A, B € i. B CA. If Ci € Ži, k =2,..., n, then (us- 


ing a product notation for intersection) 


P[(A — B)C;, +++ Cin] = P(ACi, t Ci,) — PEC - ++ Ci, ) 


= [P(A) — P(B) | | PCa) 


k=2 
= P(A — B)P(C;) P(C, ). 


Thus A — B can be added to C;, while preserving independence. Since 
i, is arbitrary, (a) follows. The proofs of (b) and (c) are quite similar, 
and (d) follows from (a), (b), and (c). 

Now let 4, = {A, B}, Z2 = {C}, where A and C are independent, and 
B and C are independent. Since P[(ANB)MC] need not equal P(A A 
B)P(C) [see 4.3.3(b)], AN B cannot be added to #44. 

Finally, we show that the o(%;) are independent iff each 4; is closed 
under finite intersection. Fix i, and consider the collection of classes Æ; 
such that é; is closed under finite intersection, 4; C .4;, and Æ; and 
the #;, j # i, are independent. Partially order the 4; by inclusion. Each 
chain has an upper bound (the union of the chain) so there is a max- 
imal class &;. Since &; is closed under finite intersection, it 1s closed 
under arbitrary differences by (a) (A — B = A — (A N B)); hence by (a), 
(b), and (c), 2; is a o-field. Thus o( ;) C G;, and consequently 4; can 
be replaced by o(4;) while preserving independence. Since i is arbitrary, 
the result follows. 
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P{Y € B} = P{X e g`! (B) = Jo- (ay fı œ) dx. Under the transformation 
x =h(y), y = g(x), this becomes fp fi ADIJO dy (see Problem 6, 
Section 2.3). The result follows. 
Let A; = {X; > 2+T7,}, Tı = min; X;. Then Xo = i= Ia, Where I4, 
is the indicator of A;. Therefore E(Xo) = X5- EUa) = X51 PAG), 
where E(Xo) = Je Xo dP. By symmetry, 

E(X) = nP{X) >2+T)} 


=n) P{T, =X;,X2 >2+Xj} 
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= n(n — L)P{T, = X,,X2 > 2+ X;} 
= n(n — 1)P{X. > 2+ X,,X3 > X,...,X, > Xj} 


= n(n — » | f @) dx, J foda | f (x3) dx; 
-00 +x] xX] 
of Fods 


= n(n — » | fe)0 -— Fœ U — F2+x))] dx. 


7. Define F`! and X as suggested. Then if 0 < y< 1, F7! (y) <x iff 
y < F(x). For if F~'(y) > x, we can find x) > x such that F(x) < y, 
and thus F(x) < F(xo) < y; if F(x) < y, by right continuity we can find 
xo > x such that F(xo) < y; therefore F~!'(y) > xo > x. Now 


P{w: X(w) < x} = Plo: F! læ) < x} = P{w: wo < Fo} = F(x). 
[This also shows that X is measurable, and that 
X(w) = min{x: F(x) > o}, O< wa < 1.] 


Section 4.10 


2. Separate F into discrete and absolutely continuous parts: F = Fi + Fo, 


where 
Q, x < 2, 
Fa={y x>2. 
Fa) = f falt) dt 
— 00 
where 
0, x<2 or x>3, 
fats) = {2 <x <3. 
Thus 


— 2? (35) + [ x” fo(x) dx 


4 2P, 50 
— -+L dx = —. 
37 [> x 9 


Bx)=[ Rdr | Part | dF 


3 
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Section 4.11 


l. (a) Assume Y;: (Q, ) > (Qi, Fi), i = 1,2,.... By hypothesis, 


(b) 


P{(Y1,..., Yn) € A, (Ynti, Yn+2,...) E€ B) 
= P{(Y1,..., Yn) € A}P{(Yn41, Yng2,--.) € B} 


if A is a measurable rectangle in [[;_,.4; and B is a measurable 
rectangle in IT, +l ¥;, the formula is still valid if A and B are finite 
disjoint unions of measurable rectangles. Two applications of the 
monotone class theorem establish this result for all A € [];_, 4; and 
B € Tent Fi. 


Let & be the class of sets B e Z” such that 
P{(Y,, Yo,-..) € B} = P{ (Yn, Ynai,.--) € B}. 


Since the Y; are independent and Py, is the same for all i, all mea- 
surable rectangles belong to @; hence & contains all finite disjoint 
unions of measurable rectangles. But & is a monotone class; hence 
= Za, and the result follows. 
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Section 5.3 


l. (a) 


(b) 


The conditional density of X given Y is h(x|y) = fœ, y)/f2(y), 
where f2(y) = f°, f (x y)dx. Thus 


OO 


E(g(X)|Y = y) -| g(x)h(x|y) dx, 


— 00O 


assuming E[g(X)] exists [cf. 5.3.5(c), Eq. (5)]. 
E(Y JA) = E(YI,)/P(A) [see 5.3.5(b)]. Now 


ra=) f f(x, y) dx dy 


and 


EWI) = | / yl a(x, vy) f x, y)dx dy 


= J J yf (x, y)dx dy. 
xEB J y=—O 
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(c) E(X|A) = E(XI,)/P(A), where 


P(A) = JJ f(x, y) dx dy 


x+yeB 


and 
E(XI,) = // xf (x, y)dx dy. 


x+ycB 


3. By 5.3.1, P{X €A, Y €B} = e4 PIX = x} fa h(y|x) dy. Thus (take A 
= R) Y has density f(y) = 5). P{X = x}h(y|x). Now define 


P{X € A|Y = y} = X` PIX = xY = y}, 


xEA 


where P{X = x|Y = y} is as specified in the problem. Then 


[pw E€ A|Y = yaPy(y) = | PIX E A|Y = y} f O) dy 


= | YPX = hold 


xecA 


= EPIX =x) f hO dy 
xEA B 
= P{X €A,Y € B}. 
The result follows from 5.3.1. 
Section 5.4 


1l. (a) We show that {(X (œ), Z(@)): œ € Q} is a function; f may then 
be defined arbitrarily off X(&2). If we do not have a function, then 
there are points @), œ € Q with X (w1) = X (œ) but Z(w,) Æ Z (@2). 
Let Ci, Co € F", Ci NO C = Ø, with Z(œ;) E€ Ci, Z(@) € Co. Now 
Z-! (C1) AZ! (C2) = Ø, and since C1, C3 € F”, we have Z~'(C;) 
= X~'(B;) for some B; € F’, j =1,2. Now Z(@,) € Cı; hence 
w, € Z '(C,) =X! (Bı); therefore X(w,) € Bı. But X(@,) ¢ Bo. 
for if so, Z(@,) € C2 as well as Cı. Similarly, X(@2) € B2, X(a@2) 
¢ B,. But X(w,) = X(@>), and this contradicts the fact that (B, — B2) 
N (B2 — B,) is always empty. 

(b) Let Qo = X(Q). If C €e F”, then 


Z~! (C) =X '(f '(C)) =X (SC) N Qo). 
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But 
Z'(C)=X'(B=X'(BNQ) = forsome Ber. 


Since X maps onto Qo, X[X~'(A)] =A for any A C Qo; hence 
fC) 
N Zo = B NA Ro € F”. But if f(Q’ — Qo) = {a}, then 


7 „on JØ ifa gC, 
f (ONR 20) = 19) a if a eC. 


Therefore f~'(C)N (Q — Qo) € F’; hence f 1(C) E F. 
Section 5.5 
2. J Y dP = E[Y (I° X X(Ig°o Z)| 
[XEA, ZEB) 

= E[Y (A ° X JEU pg ° Z] by independence 
= E[E(Y (I4 ° X)|X)JELIp o Z] 
= E[(Ia o DEY IX HEU pg o Z} 
= E[UacoX)E(Y|X)EUpoZ)| by independence 


=J E(Y|X) dP. 
{XEA,ZEB} 


J vap = | E(Y|X) dP (1) 
((X,Z)€C) ((X,Z)eC} 


for C a measurable rectangle A x B, A € ¥', B e F” (where X: (Q,.F) 
> (V, F’), Z: (Q, F) > (Q", F”). By the monotone class theorem, 
(1) holds for all C €.¥’ x F”. [Integrability of Y is used in showing that 
if (1) holds for C1, Co,... and C, | C, then (1) holds for C.| 


Thus 


Section 5.6 


I. (a) Let g(, y) = Ia, @)Mp (y) be the indicator of a measurable rectangle 
Ao X Bo E F” x F”. Then 


J e(X, Y) dP = P{X € AN Ao, Y € Bo} 
IXEA} 
-| P{Y € Bo|X = x} dPx (x) 
ANMAg 


- J La, (X)P_(Bp) dPx (x). 
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(b) 


(b) 
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Thus 
Elg(X, Y)IX = x] = I4,()Px(Bo) 


— J In IR O) dP) 
_ J g(x, y)dP(y) (ae. [Pxl). 


Thus the result holds for g of this type. We proceed to indicators of 
arbitrary sets in #’ x F” using the monotone class theorem, and 
then to arbitrary g in the usual way. 


By (a), 
P{(X,Y) € CIX =x} = Ell c(X, Y)|X = x] 
= f 1c, y) dP) 
=P,(C(x)) (ae. [Px]): 
P{X €e R’, (X,Y) € C}=fo, P{(X, Y) € C|X = x} dPx(x) by defini- 


tion of conditional probability. The result follows from (b). 


Since u(E) > 0 and ô < 1, there is an open set V D E such that 
u(V) < ô`!u(E), V is a disjoint union of open intervals /,,. Then 


8X en) = 8u(V) < ME) = HENV) =) WEN). 0) 


Therefore 6uCU,) < (EA In) for some n. [Note that 


9 en) = WV) < 8" u(E) < 00, 


n 


so it is not possible to have both sums infinite in (1).] 

By (a) there is an open interval J such that w(E MJ) > ž pu). We 
show that (—3 (1), 3(1)) C D(E). Let |x| < jul). If E QI and 
(E QI)+x are disjoint, the measure of their union is 2u(£ N I) = 
(1). But (ENKIDULEN I) +x] cU (I +x), an open interval of 
length less than w(/) + 2 ud) = 3 u(I), a contradiction. Thus there 
is an element y € (ENI)O[(EN/) +x]. But then y € E and y= 
z +x for some z € E; hence x € D(E), as desired. 
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(c) 


(d) 


(e) 


(£) 


Since the circle is compact, there is a subsequence converging 
to a point v on the circle. Given any positive integer N, choose z, 
such that n > N and |z, — v| < €/2; then pick z,+,(k > 0) such that 
Zntk — v| < €/2. Then 0 < |Z, — Zn4%| < £. (Note that zn %z, 1, 
since a@/27 is irrational.) Thus zn, Zn+k, Zn+2k, --. form a chain that 
eventually goes entirely around the circle, with the distance between 
successive points less than £. Thus, given N, we can find z,, r > N, 
such that |z, — z| < £. The result follows. 7 
Since C = {1+ x: x € B}, it suffices to consider B. But B is dense 
iff the set of numbers nZ, n an integer, reduced modulo 2, is dense in 
[0, 2). Equivalently (consider@ > e, 0 < 6 < 2), {e'"*: n an integer} 
is dense in the circle if a@/z is irrational. But in this case a@/27 is also 
irrational, and the result follows from (c). 

Let F € @(R), F C Eo. We claim that D(F) OA c {0}. For if x, y e 
F and x — y € A, then x ~ y; but x, y € Eo; hence x = y by defi- 
nition of Eo. Now assume u(F) > 0. D(F) includes a neighbor- 
hood of 0 by (b), so that (0, a) C D(F) for some a > 0. Since A is 
dense by (c), we have (0, a) N A Æ Ø, contradicting D(F) A © {0}. 
Thus (F) must be 0, so that if Eo is Lebesgue measurable, then 
(Eo) = 0. 

Now if x € R, then x is equivalent to some y € Eg; hence y — y 
€ A. Therefore R = | J{Eo + a: a € A}. Butif y +a, = z + 4, where 
y, z € Eo, aj, a2 € A, then y — z= a ~ a, € Á. (Note that 4 is g 
group under addition.) Thus y ~ z; but since y,z € Eo, Y =? and 
therefore a, = a>. Thus the sets Eg + a, a € A, are disjoint. 

Finally, assume Eo Lebesgue measurable. Then (Eo + a) = (Ep) 
by translation-invariance of Lebesgue measure. Since A is countable, 
the preceding paragraph implies that (IR) = 0, a contradiction, 

If x € R, then x = y + a for some y € Eo, a € A [See the argument of 
(e)]. Since A = B U C, it follows that R = M U M’. Let F bea Borel 
subset of M. We claim that D(F) N C C {0}. Let x, y € F with y — y 
ECCA. Then x~ y, and x=z; +b, y =z: +b2z, were z, 
z2 € Eo, bı, bz € B. It follows that zı — z2 = x — y + b2 —b; € A: 
hence zı = z2. But then x — y = bı — b2 E€ BNC, 80 x = ). Since 
C is dense by (d), the same argument as in (e) shows that u (F) = 0, 
Finally, since M’ = {x + 1: x € M}, any Borel subset of M’ has Le- 
besgue measure QO, by translation-invariance. 

The first statement follows from (f). If EM c GC E, ther f -G 
C E—M c ENM, so the second statement follows from (f) also. 


If (B}NA)U (BNA) = (B NA)U (BNA), then BC H = 
B' QH, B: OAS = By A H°. If, say, w(B, — By’) > 0, then B, — 
Bı’ is not a subset of H° since H° has inner Lebesgue meisure O, 
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so there is an x € (B, — B^) NH, contradicting B, O H = B,' NA. 
Thus ,.(B, — Bi^) = 0; a symmetrical argument shows that (B," — 
B) = Q. ` 

wW HBEF, 


1 1 1 
P(H AB) = zu(B) = 5P(B) = J 5 aP. 


Thus P(H|¥) = ż a.e. But P(A|¥) = OC, H) a.e.; hence Q(w, H) 
= 5 ae. Similarly Q(w, H°) = 5 a.e. 

(c) IfB,B, e ¥, P(BAB)= Se, Ip dP, so P(B|Y) = Ip ae. 

(d) By (b) and (c), there is a set N €.¥ with P(N) = 0 such that for œw ¢ 
N, Ow, H) = O(a, H°) = 3 and O(w, B) = Ip(@) for all intervals 
B with rational endpoints. For any such w, Q(w, {w}) = 1. [Consider 
a sequence of rational intervals decreasing to {@} and use the fact 
that O(@, -) is a probability measure.| But if œ € A, then O(w, {w}) 
< O(w, H) = 5, and if œ ZH, then Qw, {w}) < Ow, H°) = 4, a 
contradiction. 


CHAPTER 6 


Section 6.1 


2j k=l P(A; () Ax) 


© OT Pal 
Epa PAPAL) + Dh PAW 
yj kat PAP (Ar) 
Dhan PADPAD + Dia PAD - Dh PUD? 
” _, PAPA) . 
Now 
p< ie PADU PAD o EPAD 9 as na oo 


rel P(Ax )| 


hence the lim inf condition is satisfied. 


(b) P| Sole — Y P(A) 
k=1 k=l 


Di PAD] 


> SPa] < Var (Yih) 
k=l e? [D= P4)] 
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“(S-E 


n n 2 
= X P(A; NAx) — Sra] , 


jJ,k=l k=] 


Now 


n 2 
$ra 
k=l 


The “lim inf” hypothesis implies the desired result. 
(c) Let 


n =P yon < > Yra] 
k=] k=l 
< P4 Soe — Ù P(Ag)| > - Yra] | 
k=] k=l k=l 


Then liminf >o dn = 0 by (b). Thus we can find integers nı < n2 
< --- such that 7°", da, < 00. By the Borel-Cantelli lemma, with 
probability 1 we have 


nij nj 
Sok < ; N P(A) 
k=l k=] 


for only finitely many j, that is, 
n, 1 rj 
Dkk = 5D PA) 
k=1 k=l 

for large enough j. 

(d) By (c), ram I, diverges a.e.; hence with probability 1, infinitely 
many A, occur; thus P(lim sup, A,) = 1. 
Section 6.2 


1. Let S, = J; Xx. If S,/n converges a.e. to a finite limit, then 
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By the second Borel—Cantelli lemma, 5X02, P{|X,| > n} < oo. Since 
all X„ have the same distribution, >>, P{|X,| > n} < œ, so by 6.2.4, 
E(X) is finite. But then S,/n —> E(X) ae. by 6.2.5. | 

4. LetX = (X1, X2,...). Then g(X) = g(X(T)) whenever T permutes finite- 
ly many coordinates. Thus {g(X) < a} is symmetric, and hence by the 
Hewitt—Savage zero-one law has probability O or 1. If c = inf{a: P{g(X) 
< a} = 1}, then g(X) =c ae. 

7. (a) Let x be an r-adic rational with r-adic expansion. i,i2---i,0 0---. 
Then P{x < X <x4tr"}=P{X,;=i,,...,X, =i} =r". (Note 
that P{X = x} = [ [p2 a, with a, = r~" for all k; thus P{X = x} = 
0.) It follows that if A is Lebesgue measure, P{X e J} = AC) for 
every r-adic interval Z C [0,1], and hence for every interval J c 
[O, 1] by continuity. Thus P{X < y} = y,0 < y < 1, as desired. 

(b) Fix r and i, and let A,; = {x € [0,1]: the relative frequency of i in 
the first n digits of the r-adic expansion of x converges to 1/r}. 
Then, if Y; is the indicator of {X; = i}, 


1 1 
PIX An) = PÍZT tot) spe 
n r 


by the strong law of large numbers. If 
oo r-l 
Å — () () Ari, 
r=2 i=0 
it follows that Px (A) = P{X € A} = 1. But Py is Lebesgue measure 
by part (a), and the result follows. 
(c) We may write 


l l 
l R, (x) dx = | R, (x) dPx (x) 
0 0 


— E[R,(X)] = E[2X, — 1] = 0, 
and similarly, 
l 
/ R, ŒR (x) dx = E[R,(X)Rm(X)]. 
0 


Since R, (X) and Rm (X) are independent for n 4 m, and R? (X) = 1, 
the result follows. 
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Section 6.3 


3. Each A,; is a countable union of sets A,.1,; hence 


J Xn41dP = > | Xn41 dP 
Anj p YAn+ik 
= NO A(An41,¢) by definition of X,+ 
k 


= À (Anj) since À is countably additive 


= J X„dP by definition of X,. 
A 


nj 


Gn (X) 
5. E(Y,)= „æd n = l. 
(Y,) i so pal) x< f a (x) dx 


Let A = {(X,..., Xn) € B}, B € A(R"). Then f, Yng1 dP = EW n4ila); 


hence 
x 
eae dP = J ma) pdx, (1) 
A (EB, pny) >0} Pri) 
where x’ = (x),-.-,%n41) € R”+!, x = (x1, ..., x), the first n coordi- 


nates of x’. Now pr(x) = f% Pn~i(x') dxn41; thus if x € B and p,(x) = 
0, then p,+1(x’) = 0 except for x,,, in a set of Lebesgue measure 0, so 
the integration of q,+1(x’) with respect to x,,, in (1) will be 0. In other 
words, 


J dn+1 (x) dx’ = 0. 
(xEB, pn (x)=, Pn+1 (0) >0} 


Therefore the right side of (1) becomes 


F 
J dnai(x’) ax 
(xB, pa (x)>0, Pry (x’)>0)} 


< J Gn+i(') dx’ 
{xEB, pn (x)>0} 


— J Gn (x) dx 
{xEB, py (7)>0} 


= | Io) (x) dx 


x€B, pn (2)>0) Pn (X) 


= E(Y,I4) = J Y, dP. 
A 


proving the supermartingale property. 
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n—| n 
6. (a) An = 0X Y EQNS) 
i=0 


i=] 


Yn => Xi — DV EXIF), 


i=0 i=l 
SO Yn — A, = X,,. 


n—l n 


(b) E(¥nIFn-1) = X Xi +E Xn Fn) — X EIER AF Fn 


1=0 i=] 
— £n-| + E(Xn|Fn-1) — E(Xn|Fn-1) 
— £n-|- 
(c) Angi — Ån =Xn—E(Xngil¥Fn)>0 ae. 
Section 6.4 


I. (a) E(Xn411Xn =0)= Pn+1(An41 — an+i) + (1 — 2Pn+1)0 = 0, 
E(Xn41|Xn = dpn) = dp, E(Xn411Xn — —An) = —4n, 


proving the martingale property. Since for all œw, either X, (œ) = 0 for 
all n or for some j, X,(@) = a; for n > j, X, converges everywhere. 


(b) E(|X2|) = 2 p2a2, E(|X3|) = 2 p2a2 + (1 — 2p2)2 p3a3, 
E(\X4|) = 2pra2 + (1 — 2p2)2 p3a3 + (1 — 2 p2)(1 — 2 p3)2 p444, 
and so on. Thus 
OC CXI 
lim E(X; > l-2 2 . 
lim E(\X,|) > TI p) 2 Pi ar 


The infinite product is greater than 0 since X` p; < œ0; hence E(|X;|) 
> ©. 


4. By definition of the problem, E(X,,,)|X1,...,X,) = E(Xn4)|X,,). If be- 
fore the nth drawing there are r balls in the urn, c of them white, 


c c\/c+1 c\ c c 
e(r =)= (H (0-9 
+1 r r (=) 4 r7/r+l1 r 
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Thus E(X,4;|X,) = Xn. Since |X, | < 1 for all n, Xn —> X% ae. By the 
dominated convergence theorem, F(X.) = lim, E(X,) = E(X,). 


Section 6.5 


1. Since |fa|P <2? "fn — fP HIFI, pl, and |frl? <\fn — fl? 
+|fl?, p < 1, by 6.5.3 it suffices to show that the |f, — f|? are uni- 
formly integrable. Now 


fifa- sl? du > 0 as (A) > 0 
A 


for any fixed n, and 


fifa- fedus | ifa- FI? qu +0 as n — œ 
A Q 


by the L?-convergence. It follows that the integrals of |f, — f|? are uni- 
formly continuous and uniformly bounded; the result follows from 6.5.3. 


Section 6.6 


2. (a) If the X, are uniformly integrable, E(X,) > E(X,,.) by 6.5.5. Con- 
versely assume E(X,) —> E(X). If v stands for max and A for 
min, we have |X, —X | = (Xn V Xoo) — (Xn AX) and X, +. Xo 
= (Xn V Xo) + (Xn AX). By hypothesis, E(X, +X.) > 2E(XqQ), 
and by the dominated convergence theorem, E(X, A X..) > E(X x). 
Hence E(X V Xoo) > E(Xo0), SOECXn — Xo) > EX) E(X) 
= 0. Thus X, —> Xo inL', and it follows that the X,, are uniformly inte- 
grable. (See Problem 1.6.5; in general, L? convergence of { f,, } implies 
uniform integrability of {| f,|?}.) 

(b) If AE F,, n<m, then [,X, > fi Xm. Let m— oo; by Fatou’s 


lemma, 
[ x = [imx, < liminf | x, < [ x. 
A A ™ m A A 


(c) By Fatou’s lemma, E(X,,) = E(iim, X,,) < liminf, E(X,) = 0. 
Section 6.7 
l. |X7| = X7 + X7 = 2X7 — Xr; hence E(|X7|) < 2E(X}) —E(X 1) by 6.7.3. 


But {X", ... , X7} is a submartingale by 6.3.6(a); hence E (X+) < E(X} ) by 
6.7.3, as desired. 


(b) 


(c) 


(b) 
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Define T as indicated. By 6.7.3, {Xr, Xn} is a submartingale; hence 


EX,) > EXr)= | XrdP + | Xr dP 
{max X;>A} {max X;<A} 
> AP{maxX; > dA} + | Xn dP 
{max X; <À} 


and the result follows. 
By 6.7.3, {X1, Xr} 1s a supermartingale; hence 


E(X,) > E(X7) = J Xr dP + J Xr dP 
{max X; >A} {max X;<A} 


> AP{max X; > dA} + | X,, aP. 
{max X;<A} 
Since —X,, < (-—X,)t =X_, the result follows. 
Since 


I 
{max X= 2+7) t {supx, >a] as n,k > œ, 


l<i<n 


the result follows from (a) and (b). Note also that the same inequal- 
ities hold with {sup, X, > A} replaced by {sup, X, > A}; this follows 


because | 
{supX, > À — z ļ fsupx, > a}. 


Since X, — nm = `; (Y; — m), and E(Y; — m) = 0, {X, — nm} is 
a martingale. By 6.7.3, E(X; — m) = E(Xr, — Tm); hence E(X7, ) 
= mE(T,). Since Y; > 0, Xr, ft X7 aS in — œ, So by the monotone 
convergence theorem, E(X7) = mE(T). 

We write X, = )7%- Y7 — jai V =Xn' — Xn”. By (a), EXr) 
= E(Y;)E(’), E(Xr”") = E(Y, )E(T). Since E(T) is finite, so are 
E(X7') and E(Xr”); hence E(|X7|) < œ and 


E(X7) = E(X7') — E(X r") = [EU t) — EYT JET) = mE(T). 


To prove (a), observe that if all Y; > 0, then 


EX7) =X_EXnliT=n)) 


n=] 
OO 

= SCE (X, PIT =n} by independence 
n=] 
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= mS nP(T =n} 


n=] 


= mE(T). 


Part (b) is proved just as above. 
Section 6.8 


3. (a) {X,} converges a.e. to a finite limit; hence 
P{|X,4,; —X,| > b for infinitely many n} =Q, 


So, a.€., X,4) = Xn eventually. 
(b) Since X, > 0, X71 ir<n) + Xr; hence by the monotone convergence 


theorem, 
E(X7) = lim Xr = lim Xn 
n — OO [{T=<n]}] fi {T<n} 
since on {T = k}, k < n, we have Xr = X; = Xx4) =--- = Xp. But 
lim Xn < iim sup | X, < E(Xo) 
n {TEn} n Q 


by the supermartingale property. 

(c) T is the time at which the betting stops. In this case, T is also the 
time of going broke. By (a), T is a.e. finite, and the result follows. 

(d) Realistically, there is a limit on what we can lose. In practice, what 
we are doing is starting with a capital of x > 0, and stopping when 
we reach x +1, provided we have not been wiped out (reduced 
to zero) beforehand. The probability of reaching x + 1 before O is 
x/(x +1) < 1, and 


x 
x+1 
(See Ash, 1970, 6.2 for details.) 
5. (i) If 2o j=] lA; = oo and ja q; < œ, then X, > oo. 
G) If j= lA; < oo and ja q; = œ, then X, > —oo. 


1 
E(X7) = (x + 1) + —— (0) =x = E(Xo) 
x+1 


But {X,,} is a martingale by Problem 4; hence by 6.8.4, X, converges a.e. to 
a finite limit on {sup X„, < œ or inf X, > —oo}. Incase (i), inf X, > — œ 
and in case (11), sup X, < oo, So We have a contradiction unless the sets 


{do L, = 00, > 4; < o} and {do ha, < o, >a; = 0} 


have probability 0. (Note that |74, — q;| < 2 so 6.8.4 actually applies.) 
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CHAPTER 7 


Section 7.1 


4. (a) 


(b) 


If |A(u)| = 1, then h(u) = e” for some a; hence e'”” = f, e” dF (x), 
or | = te e!"(—4) dF(x). Take real parts to obtain Jel — cos u(x — 
a)\dF (x) = Q. Since the integrand is nonnegative, we have cos u(x — 
a)=1 ae. [Py]. But cosu(x —a) = 1 iff x =a+2nnu7!, n an 
integer, so X has a lattice distribution. The converse is proved by 
reversing the argument. 

By part (a), 


P{X =a+2nnu"'! for some integer n} 


=P{X =b + 22m(au) | for some integer m} = 1 


for appropriate real numbers a and b. If X is nondegenerate, the lat- 
tices {a + 27nu7!: n an integer} and {b + 2727m(au)—!: m an integer} 
must have at least two points in common, and this implies that 27ru7! 
and 27(au)~! are rationally related. Thus @ is a rational number, a 
contradiction. 

By 7.1.5(e), A has n continuous derivatives on R andh“™(0) = i*E(X*), 
k=0,1,...,n. Now if h: I — C, where J is an interval of R con- 
taining 0, and A has n continuous derivatives on /, then for u € J, 


n--| n—|l 
h(u) = A Oky k ef h™ (u D dt. 


r 1)! 


(This is an exercise in calculus; see Ash, 1970, p. 172 for details.) 
Add and subtract 


(n) n n-l 
WO a famol a 
n! 0 (n — 1)! 


from the above equation to obtain 
n hh (0) y. 
h(u) = ) + Ra), 
k=0 
where 


(1 — t)'- l 
(n — 1)! 


R, (u) = u” [ th (ut) — h™O)] dt. 
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Since R,,(u)/u"” — 0 as u — 0 by the dominated convergence theo- 
rem, the result follows. 
(b) By 7.1.5(e), h has n continuous derivatives, so as in part (a), 


et KOO) , [ (1 = A"! 
h —- — 1" yr —_____—_§— dt. 
(u) 2 7 u u A (ut) in — D! f 


Now A®(u) = feixe" dF(x) by 7.1.5(e); hence [A (ut) 
< E(|X|"); the result follows. 


E, —E_,\’ 1 


2 


ee) irx _ p—iFxX 
J (=) dF(x) 
-o 2r 
0° , 2 
=- (=) x’ dF(x). 
-oo rx 


(b) By L’Hôspıtal’s rule, 


lim Ser) — 2h (0) + h(—2r)] 


_ 2W Or) — 2h'(—2r) 
mM —— 
r—>O Sr 


— lim h'(2r) — k (0) +h (0) — h (-2r) = h" (0) 
r—>0 2(2r) 


Thus 


0° 2 
—h" (0) = lim (=) x? dF(x) 
r 30 00 rx 


00 2 
> J lim (= =) x’ dF(x) by Fatou’s lemma 
r—> rx 


— OO 


= [ x” dF(x). 


— Oo 


(c) Assume the result holds up to the integer n, and assume h2@n+2)(Q) 
exists and is finite. Then ine x” dF(x) < 00, So by 7.1.5(e), 


h?”)(u) — [ (ix el dF (x) 
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Or 


(1Y hC (u) = J i e”™ dG(x), 


— 00 


where 


G(x) = J i t” dF(t). 


— OO 


Since G is a bounded distribution function with characteristic func- 
tion (—1)"h?™, part (b) shows that f” x7 dG(x) < oo, that is, f2? 
x?” +?dF(x) < œœ. The result follows by induction. 


Section 7.2 


4. 


(a) 


(b) 
(a) 


(b) 


Choose a, b € I such that L = f? g(u) du Æ Q. (If this is not possible, 


the integral of g is O on all subintervals of J; hence on all Borel 


subsets of J, and therefore g = 0 a.e., contradicting | exp(iua,,)| = 1.) 
We may assume that exp(iua,) converges when u =a and u = b 
(since f? g(u)du is continuous in a and b). Now 


an f exp(iuan)du exp(iba„)— expliaan)  g(b)—g(a) 
n— n a a ae a 
M exp(iua, ) du Í ” exp(iua,, ) du L 


This is immediate from (a). 


Let F Œœ) = 1,x > n; F(x) = 0, x < n (corresponding to a random 
variable X, = n). Let Fo(x) = 0 for all x. Then F, (x) —> Fo(x) for 
all x € RU {—ov}, but F (œ) +— Folo), so F, does not con- 
verge weakly to Fo. 

If n is even, let F,(x%) = 1, x > n; F,(x) = 0, x < n (corresponding 
to X, =n). If n is odd, let F @œ) = 1, x > —n; F (x)= 0, x < 
—n (X, = —n). Let Fox) = 0. Then F, (a, b] — Fola, b} for all a, 
b € R, but F,(—2œ0, œ] +—  Fo(—oo, oo]. Furthermore, lim,_,.. 
F„(x) does not exist for any x € R. 


If F, converges weakly to Fo, then {F,,} is relatively compact, and hence 
tight by 7.2.4. Thus assume {F,„} tight. Given £ > 0, let a and b be finite 
continuity points of Fo such that F,,(R — (a, b]) < e for all n. Then 


lim sup F, (R) < £ + lim sup F, (a, b] 


hi OO hi—> OO 


= £ + Fo(a, b] 
< £+ Fo(R). 
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But F,(R) > F, (a, b|, hence 
liminf F, (R) > Fo(a, b}. 
ti > OO 


Since € is arbitrary and b may be taken arbitrarily large and a arbi- 
trarily small, we have F„ (R) —> Fo(R). A similar argument shows that 
F,(—o, b] > Fo(— oo, b] and F,,(a, œo] > Fola, oo] if a and b are fl- 
nite continuity points of Fo. Therefore F, converges weakly to Fo. 


8. (a) E(XnsibFn) = E(X php pi (U) explu¥ns Fn) 


= Xk, (u)E[exp(iuy,, 4 )] 


since X, is Z, -measurable and the Y, are 
independent 


= Xp. 


(b) By hypothesis, | [ý_; 4, — hx uniformly on bounded intervals. Thus 
if J is a bounded open interval containing 0 on which |hy| > 6 > 0, 
then [[;_, 4, is bounded away from 0 on 7. Thus for any fixed u € /, 
(Xn, Fn} is a bounded martingale, and hence converges a.e. But 


n —1 n 
Xn (Œ) = Tro) exp uE ro, 
k=1 


k=1 


and the result follows. 
(c) Let C be the set of pairs (u, œ), u € I, œ € Q, such that 


exp ju ` z 


k=] 


fails to converge. By (b), {w: (u, œ) € C} has probability O for each 
u € I, so by Problem 4, Section 2.6, {u: (u, œ) € C} has Lebesgue 
measure 0 for almost every øw. 

(d) Convergence a.e. implies convergence in probability since a prob- 
ability measure is finite, and convergence in probability implies 
convergence in distribution by 7.1.7. By parts (b) and (c), conver- 
gence in distribution implies convergence a.e. 


502 SOLUTIONS TO PROBLEMS 


Section 7.3 


3. Let the X,, be uan. Then 


ax | f (el — 1) dF ap(x) 


max |h,,(u) — 1| = 
l<k<n chon 


< max le — 1| dF (x) 
Iskan J ix) <e 
+ max ie — 1| dF, (x). 


ISkSn Jixj>¢ 


Now Je’ — 1| < |ux|: hence 


max Anu) — 1| < max |ux| dF, (x) 
leks lLeksn Jix|<e 
+ max 2dFn}(x) 


lek<n |x|>E 


< |ule + 2 max P{|X nkl = £}. 


The second term approaches 0 as n —> oo for any € > 0 by the uan hy- 
pothesis, and thus 


max |A u) — 1| > 0 as n — œ 
leksn 


uniformly for u in a bounded interval. 
Conversely, assume 


max Ana — 1| > O. 
By the truncation inequality 7.2.7, 


max P{|Anx| Z €} = max dF nx (X) 


L/é 
< ve | max |1 — hy, (v)| dv 
0 leksan 


since |1 — Re z| = |Re(1 — z)| < |1 — zl. 
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The integral approaches 0 as n — œ by the hypothesis and the domi- 
nated convergence theorem, and the result follows, 
4. (a) Var X, = 1 for all n, so cp = ./n. Now, for a given € > 0, if n is 
large enough so that ¢./n > 1, then 


X 
p {|= > e} = PUIXi| = evi} 
Cn 
0 if k < &/n, 
=< ] l 
Thus 

k 1 l 

lekan Ch Een C 


luo le 
(b) -r S | x? dF,(x) = n SE Xi ixe] n 


2 
Ch yay Yleen _ 


Again let n be large enough so that ¢./n > 1. We obtain 


| č 1 1 1 

— X RP P{IX;| =k} ~ —(n — evn) (1-2) >l-->0. 

n n C C 
k=e./n 


Thus the Lindeberg condition fails for the X;. Now 


Var Xn = EX} ix eym] 
= EX} =1 if ksn 


1 
= P{|X l =1}=- if k>vJn. 
C 
Thus 


I2 : po 1 — ~n P 
(Cn'Y = Var Xn’ = [Vn] + -in — nD ~ 7 


k=l 
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The Lindeberg sum for the X,,;’ is 


1 < 
Tne Do ELK nt’ tate 
n k=l 


o A 
~ = by PUK =k) 


k=éc,,’ 
C n l 
~E (vi ey") (1-5) — 0. 
n C C 
By 7.3.1, S„'/c,’ ——> normal (0, 1). 


(c) P{Sn # Sn} < XO P(X, A Xnr} < XO PIXA > vn) 
k=l 


k=l 
< X PIX = k) 
k=Jn 


—> 0 as n — oo 


since 


P(Kil =H) = (1-5) and SL < 00. 


o k c 


(d HY, $, and a, — 1, then a, Y, £, Y; for if A, is the charac- 
teristic function of Y, and A is the characteristic function of Y, we 
have h, —> h uniformly on bounded intervals; hence h, (a,u) 
— hlu). Now fcSp'//n = an(Sn'/Cn'), where a, — 1, so that 


d 
(ATI nS,’ —> normal (0, 1) by (b). Also, if Y, —— Y and 
P{Y, + Yq'} > 0, then Y,’ —> Y because 


PLY, < y} = P{Y,, < y, Yp < y} + P{Y, < y, Yp > y) 
< PIY, < y} + P{Y, Æ Yp} 


Thus by (c), /cS,//n L, normal (0, 1). But by (a) and (b), 
Sn /¢n = Sn / Jn 4 normal (0, 1). 
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l. (a) Elexp(iuk(sgn X))|X\|~")] 


7 1 
J — exp(iuk(sgn x)|x|~") dx 


n on 


1 [ cos(ukx ") dx 

n Jo 

= ] — 1 fu — cos(ukx ")] dx 
n Jo 

— ] — 1 pii — cos(ukx ")| dx — g(n)} , 
n 0 


where g(n) = [f [1 — cos(ukx™")] dx —> 0 as n > œ since 
1 — cos(ukx”") = 2 sin? sukx~" ~ ox, 2r> 1. 
The result follows from Theorem 7.1.2. 


(b) Let J = fJ [1 — cos(kux~")] dx; then [see Eq. (3) of the proof of 
7.3.1], 


nin (1 — at — en) = g(n) —I + “ir — gm}? > —I; 


hence h, (u) > e~!. 


(c) We have 
dy= — Jul kl x? dx = — juj Ry? dx: 
hence 
OO 
h(u) = exp (=m f (1 — cos y" )y 7 iy) 
0 
This is of the form exp[—d|u|*], d > 0,0 <a < 2. 
Section 7.5 


l. A logarithm of h(u) is given by 


œo k 
iu + Log(1 — q) — Log(1 — qe™) = iu + > “(els — 1). 
k=! 
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Thus 


But 


is of the Poisson type [see 7.5.3(b)] with A = g*/k, a = k, b =1/n, and 
the result follows from 7.5.7. 


4. F3(z) = Pxy{(x, y) E R x+y <z) 


J J dPyy(x, y) 


x+y sz 


OO OO 
J J EF (x+y<z) 4Px(x)dPy(y) 
~00 J —00 


— J Py{x: x <z — y}dPy(y) 


-| F(z — y) dF 2(y) 


—OO 


COO 
— J F(z — x)dF (x) by a symmetrical argument. 


—OO 


In the case where X has a density, 


00 oo z—y 
F3(z) = j F(z — y)dF2(y) -| ( fu(s)de) dF o(y). 


— OO 


Let x = u — y to obtain 


ro = | (| fu — y)du) dF (y) 
=j (| fiu — y)dF20)) du, 


and the result follows, 
Section 7.6 


1. By the strong law of large numbers, F„(x, œ) converges a.e. to 


|% — 
Elx] =F), and Fr ,0)=— 9 Imao) > F@) ae 
k=] 
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(This holds for each fixed x € R, and the exceptional set of measure 0 
depends on x.) 

Assume that F,(x,@) > F(x) and F, x, œ) > F(x) for w é Ax, 
where P(A,) = 0. Let S be a countable dense subset of R containing all 
discontinuity points of F. If A = | J{A;: x € S}, then P(A) = 0 since $ 
is countable. If œw ¢ A, then F,(,@w) > F(x) and F œ, œ) > Fx) 
for all x € S, Furthermore, F, (œ, œ) = 1, F(co) = 1, F,- œ, w) = 0, 
F(—oo) = 0. By 7.6.1, F x, ©) — F(x) uniformly for x € R. 


CHAPTER 8 
Section 8.2 


4. If T—'A CA, then AS c T~!A‘, and conversely; also, A— T7!A = T! 
A“ — A‘, This shows that the two definitions of incompressibility are 
equivalent. 

(a) (i) implies (ii): Let A be wandering, and let B = |) o T "A. Then 

T'B=U",T "A c B,soby (1), u(B —T~'B) = 0.ButB —T~'B 
= A since the T™”A are disjoint; hence (A) = 0. 

(ii) implies (iii): Let AM = ANU, T”A = {w €A: T’we 
A for some n> 1}. If C=A—A™™, then T”C = {w: T"w € 
A, but Tœ ¢ A,k > n}. Thus the T~"C are disjoint, so that C is 
wandering. By (2), (C) = 0, and therefore T is recurrent. 

(iii) implies (i): Let T-'!A c A. Then (by induction) T "A c T7!A, 
n > 1. Thus T'A=(J, TA. Now 


OO 
A-T'A=A-|JT"A=A-A®, 


n=l 


which has measure 0 by (3). Thus w(A — T~'A) = 0, proving (1). 
(iv) implies (iii); Obvious. 
(i) implies (iv): Let 
AY =ANlimsupT "A = {w E€ A; T'o EA 
n>] 


for infinitely many n > 1}. 


If Ae F, let B=|J",T "A. Then T-'B CB, hence by (1), 
u(B — T~!B) =0. Similarly, T~4+?B c T-*B, hence 


u(T *B— T~*+DB) = 0. 
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But 
T KB _ T&+) p 
oC OO 
=|Jr-"a- LJ T"A 
nak n=k+1 
= {w: Tw enters A for the last time at n = k}. 
Since 


e% 
A-AM=AN Jo: T"w € A for the last time at n = k}, 
k=0 


we have 


CxO 
u(A — A®) < X` (TB — TB) = 0. 
k=0 
(b) Any interval of length less than 1 is a nontrivial wandering set, 
(c) Let A be a wandering set; since 


32 H(A) = Soar") = (U ra) < (2) < o0, 
=0 n=0 


n=0 


H(A) must be 0. Thus T is conservative, hence infinitely recurrent. 
Section 8.3 


5. (a) (i) implies (ii): By hypothesis, U has a left and a right inverse, hence 
U is one-to-one onto; also, (f, g) = (f, U*Ug) = (Uf, Ug}. 
(ii) implies (iii): Take g = f. 
(iii) implies (ii): Use the polarization identity (3.2.17). 
(ii) implies (i): Write (U* Uf ,g) = (Uf ,U**g) = (Uf ,Ug) = (fg). 
Since f and g are arbitrary, U*U = I. But then (UU*)U = U; so if 
U is onto, then UU* =I. 
(b) This is done exactly as in (a). 
(c) If Uf = f, then U*Uf = U*f, hence by (b), f = U*f. Conversely, 
if U*f = f, then 


IUf — fil? = UF IP — (f, Uf) — (UF, f) FAAP 
= 2 fll’ —(U*f, f) — (f, U*f) 
=2\|f\l’ —2(f, f) by hypothesis 
= Q. 
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(d) If f E E, fk — f, then 
|Amf —Ay, fl < Am f — An fxll + Am Sx — Ån fill + An Fk — An fl 
< 2\| f a fill + Am Fk — An fill, 


and it follows that {A, f} is a Cauchy sequence, so that f € E, 
proving E closed, It is immediate that E is a subspace. 

(e) If f €M,thenA, f = f,so f = f. lf f =g—Ug € No, then A, f 
= n`! (g — U"g), hence ||A, f || < 2n~'|Igll — 0. 

(f£) Ifh eH, then 


hL LN iff h L No 
iff (h, g — Ug) =0 for all gcH 
iff (th — U*h, g) =0 for all geH 


iff U*h=h 
iff Uh=h by (c) 
iff heM. 


Thus M = N , and the result follows. 

(g) Since E =H, A, f converges to a limit f. Write f = fi + fo, 
where fı EM, f2 € N. Now A, fı > fı by (e), and also A, f2 > 
0. Choose g € No such that || f2 — g|| < £; then 


An fall < llAn€f2 — 8) + Anell < f2 gll + llAngll < e+ lAng. 


By (e), Ang — 0, so A, f2 — 0. Therefore A, f —> f, = Pf. 

(h) By definition of $ and Î, we have ST = ÎS = 1. By 8.3.1, $ and T 
are isometries, and they are invertible, they must be unitary operators. 
By (a), if U =T, then U* = $. 

6. If P is not an extreme point, so that a representation of the given form 
is possible, then P, is preserved by T, P << P, and P; + P Gf P =P, 
then (1 — A,)P) = A.Po2, so that P; and P2, being probability measures, 
must be identical). By 8.3.12, P is not ergodic. 


Conversely, assume that P is not ergodic. If A is an invariant set with 
0 < P(A) < 1, then for each B € F, 


P(B) = P(A)P(B|A)+ P(A‘) P(B|A‘) 
= àP (B) + A2P2(B). 


By the end of the proof of 8.3.12, Pı and Pz are preserved by T, hence 
Pi, P2 € K. If Pi = P2, then P = P); but P| (A) = P(A |A) = 1 F P(A). 
Therefore P; Æ P2, so that P is not an extreme point. 
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7. (a) implies (b): Apply the pointwise ergodic theorem. 
(b) implies (c), (c) implies (d); Obvious. 
(d) implies (e): By the pointwise ergodic theorem, / 
almost everywhere, hence in probability, By (d), i a = P(A) ae. 
(e) implies (f): Since ry? —> [pn a.e., we may multiply by Z4 and inte- 
grate to obtain, by the dominated convergence theorem, 


™ converges to I, 


1 n—i A 
—S > P(ANT*B) > E(l4Îs). 
n k=0 


But if ¥ is the o-field of almost invariant sets, 


E(I4lp) = E[EU alg) S] 
= E[IpEU,|¥)| by 8.3.8 
= E(Îalg) by 8.3.9. 


Under the hypothesis (e), I, = P(A) a.e., I gp = P(B) a.e., proving (f). 

(f) implies (g): If A, B € Z and £ > 0, choose Ag, Bo € Fo such that 
P(A A Ao), P(B A Bo), and P[(A O T™*B) A (Ao NT~*Bo)] are less than 
€ for all k (see the proof of 8.2.7). Then 


n-li 
| S P(A O T~*B) — P(A)P(B) 
k=0 


n 
n-i 


- N [P(A NT*B) — P(A NT~*Bo)] 
k=0 


< 


n-li 


l 
- N [P(Ao N T™*Bo) — P(Ao)P(Bo) 
k=0 


+ 


+ |P(Ag)P(Bo) — P(A) P(B)I. 


The first term is less than € for all n, and the second is less than ¢ for 
large n by (f). Since the third term is less than 2e, the result follows. 

(g) implies (a): Let A be an invariant set, and set B = A; then n7! an, 
P(ANT~*B) = P(A). By (g), P(A) = [P(A)]’, hence P(A) = 0 or 1; thus 
T is ergodic. 
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Section 8.4 


3. 


(a) 


(b) 
(c) 


(d) 


We have E(R,) = 1 + ẹ};> P(B,), and 


P(B) = P{X, #0, Xk + Xp) Æ 0, ..., Xk t +X FO} 
= P{S,; Æ 0, $2 Æ 0, ..., Sg-1 Æ 0) 


since the X; are iid. 


Thus n~!E(R,,) is the arithmetic average of a sequence converging 
to P(A), hence n~!E(R,,) > P(A). 
The Z, are iid, with |Z;| < N. Since Ran < Zi +-:-+Z,, the strong 
law of large numbers yields the desired result. 
For any positive integer n, (k-—1)N+1<n<kN for some k, 
hence |R, — Ryy| < N; therefore 

Ra Riv kN N 
n kN n n 
Since kN/n —> lasn — œ, limsup,_,.,n~!Ry < NT'E(Z1) by (b). 
Now Z, = Ry and N is arbitrary; thus the result follows from (a). 
Since V; = 1 1f Xz+] Æ 0, Xz+1ı +X k+2 Æ 0, ..., and V; = O other- 
wise, V; can be expressed as g(Xz, Xz+1, -..)} where g: R° > R”, 
measurable relative to [.@(R)]°. The pointwise ergodic theorem 
therefore applies. 
The sum }`;_, V; is the number of states visited in the first n steps 
that are never revisited. If i < j, and $; and S; are never revisit- 
ed, then S; Æ S;; thus $; Vi < Rn. By (d), liminf,_,.5n'Rn = 
E(V,) ae. But since the X; are tid, 


E(V,) = P{X, £0,X, +X: 40,...} = P(A). 


Section 8.5 


l. 


Let {X,} and {X,,’} be discrete ergodic sequences with entropies H and 
H’. Flip a coin; if the result is heads, let X,,” = X, for all n, and if tails, 
let X," = X,,’ for all n. If p is the probability of heads, the limit random 
variable is H with probability p, and H’ with probability 1 — p, so that 
the entropy of {X,”"} is pH + (1 — p)A’. In part (a), choose H = A’, 
and in part (b), choose H + H’. There is no problem in realizing these 
choices; for example, if the X,, are independent and take on r values with 
equal probability, then H = logr. 


Index 


absolute homogeneity, 127 
absolute moments and central moments, 191 
absolutely continuous 
function, 72 
measure (or signed measure), 60, 65 
random variable, 175 
random vector, 177 
abstract Lebesgue integral, 37ff. 
adapted process, 426 
additivity theorem, 46 
adjoint, 164 
algebra of sets, 3 
almost everywhere (a.e.), 48 
almost invariant set, 350 
almost surely (a.s.), 176 
annihilator, 164 
approximation theorem, 19, 32, 40, 92, 94 
atom, 225 


Baire category theorem, 158 

Banach space, 128 

Bernoulli shift, 394 

Bernoulli trials, 170, 199, 309 

Bessel’s inequality, 133 

beta function, 202 

Bochner’s theorem, 296 

Borel measurable function, 36ff. 

Borel sets, 6, 121 

Borel—Cantelli lemma, 68, 285 
second, 238 

bounded linear operator, 142 

bounded sequence of distribution functions, 300 

bounded variation, 74 

branching processes, 251, 282 

Breiman’s gambling model, 284 

Brownian bridge, 406 

Brownian motion, 401 


Cantor function, 79, 175 

Cantor set, 35, 79 

Caratheodory extension theorem, 19 

Cauchy density, 176 

Cauchy sequence, 89 
in measure, 96 

Cauchy—Schwarz inequality, 85, 95, 130 
for sums, 91 

central limit theorem, 290ff., 343 


central moments, 191 
chain rule, 71 
characteristic function, 291, 341 
Chebyshev’s inequality, 88, 192 
closed graph theorem, 163 
closed linear operator, 163 
closed subspace, 133 
complete measure space, 17 
complete orthonormal set, 136 
completeness of LP, 89, 94 
completion of a measure space, 18 
complex measure, 71 
complex-valued Borel measurable function, 83 
random variable, 193 
conditional entropy, 376, 387 
conditional expectation, 201 ff., 210 
given a o-field, 217 
conditional probability, 171, 201 ff., 210 
given a o-field, 219 
conjugate isometry, 145 
conjugate space, 156 
conservative transformation, 355 
continuity from below and above, 10 
continuity point of a distribution function, 124 
continuous 
functions dense in LP, 92 
linear functionals, 144-147 
linear operator, 142 
paths, 399 
process, 399 
random variable, 175 
convergence 
almost everywhere, 48, 96 
almost uniformly, 97 
in distribution, 290, 297 
in measure, 96 
in probability, 96, 297 
of sequences of measurable functions, 96ff. 
of transformed sequences, 335 
of types, 314, 447 
theorems for submartingales, 259ff. 
to a normal distribution, 307ff. 
convex function, 253 
convex set, 133 
convolution, 328 
correlation coefficient, 194 
countably additive set function, 5, 62 


INDEX 


counting measure, 6, 10, 20, 89, 94, 112 
covariance, 194, 343 

Cramér— Wold device, 343 

cylinder, 113, 117 


De Morgan laws, 1 
degenerate random variable, 298 
density (density function) 
of a random variable, 175 
of a random vector, 177 
of a signed measure, 68 
difference operator, 27 
differentiating under the integral sign, 53-54 
differentiation 
of measures, 76 
of real-valued functions, 78 
discrete 
probability space, 167 
random variable, 174 
random vector, 177 
distribution function, 22, 24, 28, 30, 337 
of a random variable, 173 
of a random vector, 176 
distribution of a random object, 185 


dominated convergence theorem, 50, 100, 223 


dot product, 130 
dyadic rationals, 464 


Egoroff’s theorem, 98 
empirical distribution function, 331 
ensemble average, 349 
entropy, 376, 378, 386, 391 
ergodic theorems, 355-357, 361 
ergodic transformation, 350 
essential supremum, 93 
events, 166 
prior to a stopping time, 270, 414 
expectation, 188 
exponential density, 175 
extended monotone convergence theorem, 49, 
223 
extended random variable, 173 
extension of a measure, 17, 19, 21, 24, 30 


Fatou’s lemma, 223 

Feller’s theorem, 314, 443 

field of sets, 3 

finite set function, 8 

finitely additive set function, 5 

flow, 345 

Fourier senes, 141 

Fourier transform, 292, 296 

Fubini’s differentiation theorem, 83 

Fubini’s theorem, 105, 108, 109, 111, 112 

functions of independent random objects are 
independent, 179 


gambler’s ruin, 200 

gamma distribution, 324 
gamma function, 202 
Gaussian random vector, 449 
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generalized Bernoulli trials, 170 
geometric distribution, 328 
Glivenko—Cantelli theorem, 331 
good sets principle, 5, 11, 34, 36 
Gram-Schmidt process, 139 
Gramian, 139 


Hahn-Banach theorem, 153, 155 
Helly’s theorem, 300, 339 
Hermite polynomials, 139 
Hewitt- Savage zero-one law, 246 
Hilbert space, 128 
hitting time, 270 
Holder inequality, 85, 94, 95 

for sums, 91 


idempotent linear operator, 144 
improper integrals, 57-58 
incompressible transformation, 355 
increasing function, 22, 28, 337 
indefinite integral, 61, 75 
independent and identically distributed (iid) 
random variables, 241, 308 
independent 
classes of sets, 181 
events, 168 
random variables (extended random 
variables, random objects), 177 
indicator, 37 
infinite sequences of random variables, 196 
infinitely divisible distributions, 322-329 
infinitely recurrent transformation, 355 
initial distribution, 196 
inner measure, 233 
inner product space, 128 
inner product, 128 
space, 128 
integrable function, 39, 83 
integral, 37ff. 
integrating an infinite series term by term, 
53-54 
invariant set, 350 
inversion formula, 292, 341 
irreducible Markov chain, 285 
isometric isomorphism, 137 
isomorphic transformations, 391 
Itô integrals, 426 
It6’s differentiation formula, 433 


Jensen’s inequality, 254 

joint distribution function, 177 

joint entropy, 376 

jointly Gaussian random variables, 449 
Jordan—Hahn decomposition theorem, 62 


Kolmogorov 
extension theorem, 118 
strong law of large numbers, 240 
three series theorem, 282 
zero-one law, 245 
Kolmogorov’s inequality, 237 
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Kolmogorov—Sinai theorem, 393 
Kronecker lemma, 236 


LP spaces, 84ff. 
last element, 266—270, 274 
law of the iterated logarithm, 410-413 
Lebesgue 
decomposition theorem, 70 
integrable fucntion, 53 
integral, 37ff. 
measurable function, 41 
measurable sets, 26, 31, 34 
measure, 26, 31 
set, 81 
Lebesgue—Stieltjes measure, 22, 24, 27, 30 
Legendre polynomials, 139 
length, 6 
Levy’s extension of the Borel—Cantelli lemma, 
285 
Levy’s theorem, 304, 305, 342 
Levy—Khintchine representation, 327 
lim inf, 2 
lim sup, 2 
limit of a sequence of sets, 2 
limit under the integral sign, 53-54 
Lindeberg’s theorem, 307 
line of support theorem, 253 
linear 
functional, 144 
manifold, 133 
operator, 142 
Lipschitz condition, 81 
lower limit, 2 
lower semicontinuous (LSC) functions, 122, 441 
lower variation, 62 
Lyapunov’s condition, 309 


y-integrable function, 39 
Markov chains, 196, 198, 251, 261, 262, 285, 
368—374 
Markov property, 369, 414 
martingale, 248ff., 420 
convergence theorems, 257ff. 
differences (orthogonality of), 280 
maximal ergodic theorem, 357 
mean, 191, 343 
mean ergodic theorem, 365-366 
measurable 
cylinder, 113, 117 
function, 36 
process, 400 
rectangle, 102, 113, 117 
set, 36 
space, 36 
transformation, 345 
measure, 5, 6 
concentrated on a set, 6, 26, 60 
space, 5 
measure-preserving transformation, 52, 345 
measures on infinite product spaces, 113ff. 
minimal o-field over a class of sets, 4 
Minkowski inequality, 86, 94 


INDEX 


for sums, 91 
mixing transformation, 352 
moment-generating property of characteristic 
functions, 299 
moments, 191 
monotone class theorem, 18, 21 
monotone convergence theorem, 46, 222 
multivariate normal distribution, 449 
mutually singular measures, 68 


negative part, 38, 62 
non-anticipating process, 426 
nondegenerate random variable, 314 
nonnegative definite funtion, 296 
norm, 87, 127 
of a linear operator, 142 
normal density, 176, 192, 297 
normed linear space, 127 
normed sums, 317 
nowhere differentiability of Brownian motion 
paths, 408 
null space, 145 


one-sided shifts, 143, 346, 353 
open mapping theorem, 161 
optional sampling theorem, 273 
optional skipping theorem, 257 
optional stopping theorem, 279 
Ornstein’s theorem, 397-398 
orthogonal (perpendicular) 

elements, 132 

complement, 135 

direct sum, 135 
orthonormal 

basis, 135-137 

set, 132 
outer measure, 16, 21, 233 


parallelogram law, 131 
Parseval relation, 136 
path, 399 
permutations, 346, 353 
pointwise converence of linear operators, 148 
pointwise ergodic theorem, 361 
Poisson 

distribution, 321 

process, 399 

type, 324 
Polya urn scheme, 262 
positive contraction operator, 356 
positive part, 38, 62 
positive-homogeneity, 153 
pre-Hilbert space, 128 
principle of uniform boundedness, 158 
probability 

function, 174, 177 

measure, 5,6 

induced by a random variable, 173 
induced by a random vector, 176 
space, 5 


INDEX 


product measure theorem, 102, 105, 109, 111, 


112 
product 
of measures, 111, 116 
of a-fields, 114 
o-field, 102 
progressively measurable process, 431 
projection, 134, 144 
of a probability measure, 118 
theorem, 135 
Prokhorov’s theorem, 302, 340 
pseudometric, 87 
Pythagorean relation, 132 


quadratic variation of Brownian motion paths, 


409-410 
queueing process, 287 


Rademacher functions, 248 
Radon- Nikodym 
derivative, 68 
theorem, 65, 95 
random 
object, 178 
signs problem, 239, 281 
variable, 173 
vector, 176 
walk, 200, 438 
rectangle, 113, 117 
recurrent states, 285, 288, 438-439 
recurrent transformation, 355 
reflexive space, 157 
regular 
conditional distribution function, 229 
conditional probability, 231 
relatively compact family of finite measures, 
301 
reverse martingales (submartingales, 
supermartingales), 248 
Riemann integral, 55- 59 
Riemann zeta function, 328 
Riemann- Stieltjes integral, 58-59 
Riesz 
lemma, 150 
representation theorem, 144, 147 
right-continuous 
family of sigma-fields, 415 
function, 22, 336 
right-semiclosed intervals, 4, 29 
rotations of the circle, 346, 353 


o-algebra, 4 
o-field, 4 
generated by a class of sets, 4 
induced by a random object, 216 
o-finite set function, 9 
sample space, 166 
section, 102 
semicontinuous functions, 122, 441 
seminorm, 87, 127 
separable Hilbert spaces, 138 
set function, 3 
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Shannon- McMillan theorem, 384 

shifts, 346 

signed measure, 62 

simple 
function, 37 
random variable, 174 

singular measure, 60, 68 

Skorokhod’s theorem (Skorokhod construction), 
334 

Slutsky’s theorem, 332 

solvability theorem, 165 

space spanned by a subset of a normed linear 
space, 135 

stable distributions, 317-320 

standard deviation, 191 

State space, 196 

stationary 
probability measure, 347, 353 
sequence of random variables, 347, 353 

Steinhaus’ lemma, 44 

stochastic matrix, 196 

stochastic process, 399 

stopping time, 270, 414 

strong convergence, 159 

strong law of large numbers, 200, 242, 278 

strong Markov property, 416 

subadditivity, 127, 153 

sublinear functional, 153 

submartingale and supermartingale inequalities, 
308 

submartingales, 248ff., 420 

subspace, 133 

summation by parts, 236 

sup norm, 129, 143 

supermartingales, 248ff., 420 

symmetric events, 245 

symmetrization, 281 


tail 

events, 244 

functions 244 

a-field, 244 
theorem of total expectation, 209 
theorem of total probability, 172, 209 
tight family of finite measures, 301 
time average, 349 
Toeplitz lemma, 236 
topological vector space, 128 
total variation, 62 
transient states, 285, 438-440 
transition matnx, 198 
transition probabilities, 198 
translation-invariant measures, 34, 35 
translations, 346, 353 
triangular array, 321 
truncation inequality, 303 
two-sided exponential density, 176 
two-sided shifts, 143, 346, 353 
types, convergence of, 314 


uncertainty, 376 
uniform asymptotic neglibility (uan), 313 
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uniform convergence 
in the central limit theorem, 329 
of linear operators, 148 
uniform density, 175 
uniform integrability, 262, 266ff. 
uniformly bounded random variables, 308 
upcrossing theorem, 258, 277 
upper limit, 2 
upper semicontinuous (USC) functions, 122, 
441 
upper variation, 62 


Vague convergence of measures, see weak 
convergence of measures 

variance, 191 

variation 
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of a function, 74 

of Brownian motion paths, 409 
version of a process, 400 
Vitali— Hahn — Saks theorem, 44 


Wald’s theorem, 277 
wandering set, 355 
weak compactness theorem, 300ff. 
weak convergence 
in a normed linear space, 159 
of distribution functions, 125, 290 
of measures, 124, 290, 338, 339 
weak law of large numbers, 198 


zero-one laws, 244-246 
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